llml LLM Launcher
← browse

gemma-4-26B-A4B-thinking-vision-Q4_K_XL-server

16-18 GB total memory (RAM + VRAM) for 4-bit.

llama.cpp Mixed Cross-platform Chat Updated 24 seconds ago
Model gemma-4-26B-A4B-thinking-vision-Q4_K_XL-server
Backend llama.cpp
Hardware Mixed
Use case Chat
Maintainer @flyingnobita
Last updated 24 seconds ago

Why this profile exists

16-18 GB total memory (RAM + VRAM) for 4-bit; requires mmproj BF16 file for vision; llama-server OpenAI-compatible endpoint on port 8001

Launch configuration

# args
--temp 1.0
--top-p 0.95
--top-k 64
--alias unsloth/gemma-4-26B-A4B-it-GGUF
--port 8001
--chat-template-kwargs {"enable_thinking":true}

Hardware assumptions

  • Mixed — tested envelope
  • Cross-platform — backend installed and on PATH
  • Backend: llama.cpp >= current llml-supported version
  • Profile assumes the model file is already on disk; llml supplies the path at launch