gemma-4-31B-thinking-Q4_K_XL

17-20 GB total memory (RAM + VRAM) for 4-bit.

llama.cpp Mixed Cross-platform Chat Updated 24 seconds ago

Import

Requires llml ≥ 0.5.0

$ llml import https://llml.dev/profiles/gemma-4-31B-thinking-Q4_K_XL.toml --activate

$ brew install --cask flyingnobita/tap/llml

$ llml import https://llml.dev/profiles/gemma-4-31B-thinking-Q4_K_XL.toml --activate

$ go install github.com/flyingnobita/llml/cmd/llml@latest

$ llml import https://llml.dev/profiles/gemma-4-31B-thinking-Q4_K_XL.toml --activate

$ scoop bucket add flyingnobita https://github.com/flyingnobita/scoop-bucket && scoop install flyingnobita/llml

$ llml import https://llml.dev/profiles/gemma-4-31B-thinking-Q4_K_XL.toml --activate

Copy the Run command, paste it in your terminal, and pick a local model. The profile attaches immediately — press p in the TUI to confirm.

Why this profile exists

17-20 GB total memory (RAM + VRAM) for 4-bit; strongest Gemma 4 variant; llama.cpp supports CPU and GPU inference

# args

--temp 1.0

--top-p 0.95

--top-k 64

Mixed — tested envelope
Cross-platform — backend installed and on PATH
Backend: llama.cpp >= current llml-supported version
Profile assumes the model file is already on disk; llml supplies the path at launch