llml LLM Launcher
← browse

gemma-4-E2B-thinking-Q8_0

4 GB total memory for 4-bit.

llama.cpp Mixed Cross-platform Chat Updated 24 seconds ago
Model gemma-4-E2B-thinking-Q8_0
Backend llama.cpp
Hardware Mixed
Use case Chat
Maintainer @flyingnobita
Last updated 24 seconds ago

Why this profile exists

4 GB total memory for 4-bit; 5-8 GB for 8-bit; designed for phone/edge inference; supports text, image, and audio; llama.cpp supports CPU and GPU inference

Launch configuration

# args
--temp 1.0
--top-p 0.95
--top-k 64

Hardware assumptions

  • Mixed — tested envelope
  • Cross-platform — backend installed and on PATH
  • Backend: llama.cpp >= current llml-supported version
  • Profile assumes the model file is already on disk; llml supplies the path at launch