llml LLM Launcher
contribute

A profile is a setup, not a model card.

The catalog is built from TOML files in a public GitHub repo. If you've gotten a model running well on your machine, the args and env you used are exactly what someone with the same machine needs.

Open a PR ↗ Read the schema
schema_version = 2

[[profiles]]
name = "balanced-q4"
backend = "koboldcpp"
model_hint = "Qwen3-14B-GGUF"
args = [
  "--gpulayers 80",
  "--contextsize 16384",
  "--threads 8",
  "--flashattention",
]

use_case.primary = "completion"
use_case.tags    = ["interactive", "coding"]

hardware.class       = "gpu"
hardware.gpu_count   = 1
hardware.min_vram_gb = 24
hardware.max_vram_gb = 24
hardware.notes       = "Tested on RTX 3090, CUDA 12.4, Ubuntu 24.04."

What belongs in the catalog

Profiles for llama.cpp, vLLM, Ollama, and KoboldCpp. One TOML file per profile, with a clear hardware target and a short rationale. Model-location parameters (--model, HF_HOME, etc.) are excluded — llml supplies them at launch.

What a strong profile includes

  • A name that reads as a setup, not a brand: balanced-q4 beats my-best-config.
  • Explicit backend, hardware.class, and tested VRAM range.
  • A short hardware.notes line with the actual machine you tested on.
  • Args as panel-row strings: "--ctx-size 4096", not pre-split tokens.

Validation

schema_version == 2 required
backend ∈ {llama, vllm, ollama, koboldcpp} required
use_case.primary ∈ canonical set warn
model-location params absent from args required
hardware.min_vram_gb ≤ hardware.max_vram_gb warn

PR flow

  1. Fork flyingnobita/llml-profiles
  2. Add your TOML under profiles/<name>.toml
  3. Run npm run validate locally
  4. Open a PR with one paragraph about your machine