llml LLM Launcher
LLM Launcher

Stop reconstructing llama-server flags from shell history.

llml finds your local models, detects your runtimes, and launches them with a saved profile. Pick a model, pick a profile, press R.

$ brew install --cask flyingnobita/tap/llml
Linux: go install github.com/flyingnobita/llml/cmd/llml@latest
Install llml ↗ Browse profiles →
llml terminal UI showing model scan, runtime selection, and profile launch workflow

What llml does

the loop
01

Scan

llml finds every local model — GGUF files, safetensors, Hugging Face cache — and lists the runtimes on your machine.

02

Pick

Choose a model, a runtime, and a saved profile. The generated launch command is shown before execution — no surprises.

03

Launch

Press R. One keypress, and the right command runs against the right model on the right backend.

Read more about profiles →

Don't start from scratch.

Import the exact config someone already tuned for your model and hardware — one command, it runs. Every profile is a TOML file with args, env vars, and hardware metadata. The catalog matches them to your machine.

Profiles are starting points, not guarantees. Hardware differs — expect to adapt. But starting from someone's working config beats starting from an empty terminal.

Find a profile for your machine →
$ llml import https://llml.dev/profiles/qwen3-14b-q4.toml --activate

llml export writes a portable TOML. The catalog is where those get shared. llml import pulls one back. Export → share → import → run.

Recently updated

Browse all 59 profiles →
llama.cpp community

gemma-4-26B-A4B-thinking-Q4_K_XL

16-18 GB total memory (RAM + VRAM) for 4-bit.

Mixed Cross-platform Chat
llama.cpp community

gemma-4-26B-A4B-thinking-vision-Q4_K_XL

16-18 GB total memory (RAM + VRAM) for 4-bit.

Mixed Cross-platform Chat
llama.cpp community

gemma-4-26B-A4B-thinking-vision-Q4_K_XL-server

16-18 GB total memory (RAM + VRAM) for 4-bit.

Mixed Cross-platform Chat
llama.cpp community

gemma-4-31B-thinking-Q4_K_XL

17-20 GB total memory (RAM + VRAM) for 4-bit.

Mixed Cross-platform Chat
llama.cpp community

gemma-4-E2B-thinking-Q8_0

4 GB total memory for 4-bit.

Mixed Cross-platform Chat
llama.cpp community

gemma-4-E4B-thinking-Q8_0

5.

Mixed Cross-platform Chat

What makes this different

Importable recipes, not fit data

Not a list of what fits. Not per-backend defaults. The exact config someone tuned for your model and hardware — imported and run with one command.

Portable, not platform-locked

Profiles are TOML files with a documented schema. They live in GitHub, not a service. The catalog is a thin index over real source files.

Reproducible imports

An import is one shell command. The same TOML produces the same args and env on every machine — no hidden web of preferences.

Got a profile that just works?

Share the args your machine and model converged on. PR-only. No accounts.

Read the format Open a PR ↗