gemma-4-26B-A4B-thinking-vision-Q4_K_XL-server

16-18 GB total memory (RAM + VRAM) for 4-bit.

llama.cpp Mixed Cross-platform Chat Updated 24 seconds ago

Import

Requires llml ≥ 0.5.0

$ llml import https://llml.dev/profiles/gemma-4-26B-A4B-thinking-vision-Q4_K_XL-server.toml --activate

$ brew install --cask flyingnobita/tap/llml

$ llml import https://llml.dev/profiles/gemma-4-26B-A4B-thinking-vision-Q4_K_XL-server.toml --activate

$ go install github.com/flyingnobita/llml/cmd/llml@latest

$ llml import https://llml.dev/profiles/gemma-4-26B-A4B-thinking-vision-Q4_K_XL-server.toml --activate

$ scoop bucket add flyingnobita https://github.com/flyingnobita/scoop-bucket && scoop install flyingnobita/llml

$ llml import https://llml.dev/profiles/gemma-4-26B-A4B-thinking-vision-Q4_K_XL-server.toml --activate

Copy the Run command, paste it in your terminal, and pick a local model. The profile attaches immediately — press p in the TUI to confirm.

Why this profile exists

16-18 GB total memory (RAM + VRAM) for 4-bit; requires mmproj BF16 file for vision; llama-server OpenAI-compatible endpoint on port 8001

# args

--temp 1.0

--top-p 0.95

--top-k 64

--alias unsloth/gemma-4-26B-A4B-it-GGUF

--port 8001

--chat-template-kwargs {"enable_thinking":true}

Mixed — tested envelope
Cross-platform — backend installed and on PATH
Backend: llama.cpp >= current llml-supported version
Profile assumes the model file is already on disk; llml supplies the path at launch