gemma-4-E2B-thinking-Q8_0

4 GB total memory for 4-bit.

llama.cpp Mixed Cross-platform Chat Updated 24 seconds ago

Import

Requires llml ≥ 0.5.0

$ llml import https://llml.dev/profiles/gemma-4-E2B-thinking-Q8_0.toml --activate

$ brew install --cask flyingnobita/tap/llml

$ llml import https://llml.dev/profiles/gemma-4-E2B-thinking-Q8_0.toml --activate

$ go install github.com/flyingnobita/llml/cmd/llml@latest

$ llml import https://llml.dev/profiles/gemma-4-E2B-thinking-Q8_0.toml --activate

$ scoop bucket add flyingnobita https://github.com/flyingnobita/scoop-bucket && scoop install flyingnobita/llml

$ llml import https://llml.dev/profiles/gemma-4-E2B-thinking-Q8_0.toml --activate

Download .toml

Copy the Run command, paste it in your terminal, and pick a local model. The profile attaches immediately — press p in the TUI to confirm.

Why this profile exists

4 GB total memory for 4-bit; 5-8 GB for 8-bit; designed for phone/edge inference; supports text, image, and audio; llama.cpp supports CPU and GPU inference

Launch configuration

# args

--temp 1.0

--top-p 0.95

--top-k 64

Hardware assumptions

Mixed — tested envelope
Cross-platform — backend installed and on PATH
Backend: llama.cpp >= current llml-supported version
Profile assumes the model file is already on disk; llml supplies the path at launch

gemma-4-E2B-thinking-Q8_0

Why this profile exists

Launch configuration

Hardware assumptions

Keyboard shortcuts