New profiles added regularly — watch the llml repo ↗ to get notified.

LLM Launcher

Stop reconstructing `llama-server` flags from shell history.

llml finds your local models, detects your runtimes, and launches them with a saved profile. Pick a model, pick a profile, press R.

Works alongside llama.cpp, Ollama, vLLM, and KoboldCpp. Your backend stays exactly as it is.

$ curl -fsSL https://llml.dev/install.sh | sh

Alternatives: Homebrew cask (macOS), Scoop/Winget (Windows), or Go. See Install instructions ↗

Install llml ↗ Browse profiles →

What llml does

the loop

Scan

llml finds every local model — GGUF files, safetensors, Hugging Face cache — and lists the runtimes on your machine.

Pick

Choose a model, a runtime, and a saved profile. The generated launch command is shown before execution — no surprises.

Launch

Press R. One keypress, and the right command runs against the right model on the right backend.

Don't start from scratch.

Import the exact config someone already tuned for your model and hardware — one command, it runs. Every profile is a TOML file with args, env vars, and hardware metadata. The catalog matches them to your machine.

Profiles are starting points, not guarantees. Hardware differs — expect to adapt. But starting from someone's working config beats starting from an empty terminal.

Find a profile for your machine →

$ llml import https://llml.dev/profiles/Qwen3.6-enable-thinking.toml --activate

llml export writes a portable TOML. The catalog is where those get shared. llml import pulls one back. Export → share → import → run.

What makes this different

Importable recipes, not fit data

Not a list of what fits. Not per-backend defaults. The exact config someone tuned for your model and hardware — imported and run with one command.

Portable, not platform-locked

Profiles are TOML files with a documented schema. They live in GitHub, not a service. The catalog is a thin index over real source files.

Reproducible imports

An import is one shell command. The same TOML produces the same args and env on every machine — no hidden web of preferences.

Got a profile that just works?

Share the args your machine and model converged on. PR-only. No accounts.

Read the format Open a PR ↗

Stop reconstructing `llama-server` flags from shell history.

What llml does

Scan

Pick

Launch

Don't start from scratch.

Recently updated

gemma-4-26B-A4B-31B

gemma-4-26B-A4B-31B-image

gemma-4-26B-A4B-31B-thinking

gemma-4-26B-A4B-31B-thinking-image

gemma-4-E2B-E4B-12B

gemma-4-E2B-E4B-12B-image-audio

What makes this different

Importable recipes, not fit data

Portable, not platform-locked

Reproducible imports

Got a profile that just works?

Stop reconstructing llama-server flags from shell history.

What llml does

Scan

Pick

Launch

Don't start from scratch.

Recently updated

gemma-4-26B-A4B-31B

gemma-4-26B-A4B-31B-image

gemma-4-26B-A4B-31B-thinking

gemma-4-26B-A4B-31B-thinking-image

gemma-4-E2B-E4B-12B

gemma-4-E2B-E4B-12B-image-audio

What makes this different

Importable recipes, not fit data

Portable, not platform-locked

Reproducible imports

Got a profile that just works?

Keyboard shortcuts

Stop reconstructing `llama-server` flags from shell history.