how it works

You've run the model.
Your args work.
Someone else needs them.

A profile is a solved instance, not a description.

schema_version = 3

[[profiles]]
name = "balanced-q4"
backend = "koboldcpp"
model_hint = "Qwen3-14B-GGUF"
args = [
  "--gpulayers 80",
  "--contextsize 16384",
  "--threads 8",
  "--flashattention",
]

use_case.primary = ["general"]
use_case.tags    = ["interactive", "coding"]

hardware.class       = "gpu"
hardware.gpu_count   = 1
hardware.min_vram_gb = 24
hardware.max_vram_gb = 24
hardware.notes       = "Tested on RTX 3090, CUDA 12.4, Ubuntu 24.04."

Every run is a rediscovery

Running a model well is not a model problem — it's a configuration problem. You set context size, GPU layers, thread count, flash attention. You test. You tune. You land on args that work for your machine and your workload. Then the next person, with the same GPU and the same model, starts from scratch.

Configuration has solved instances. A profile captures one.

What the profile encodes

Each field has a job. backend declares the runtime. hardware.min_vram_gb and hardware.max_vram_gb state the VRAM range. hardware.notes names the actual machine it was tested on. use_case.tags describe the workload. The fields answer the one question the catalog exists to answer: will this run well on your machine?

How you know it matches your machine

Every profile carries four fields designed to answer this.

backend llama.cpp · vLLM · Ollama · KoboldCpp

hardware.min/max_vram_gb VRAM range stated (e.g. 24–24 GB)

hardware.notes "Tested on RTX 3090, CUDA 12.4, Ubuntu 24.04."

GitHub provenance every profile links to a commit

Find, import, run

Filter by backend and GPU class on Browse

Find profiles that match your runtime and hardware. The fit summary tells you in one line whether it's worth your time.

Copy the import command

Every profile page gives you a one-liner. Copy it and run it in your terminal.

Profile attaches — appears under p in the TUI on next launch

The imported profile shows up in your local llml. Select it with p and start running with the same args.

$ llml import https://llml.dev/profiles/Qwen3.6-enable-thinking.toml --activate

You've run the model. Your args work. Someone else needs them.