A profile is a solved instance, not a description.
schema_version = 2 [[profiles]] name = "balanced-q4" backend = "koboldcpp" model_hint = "Qwen3-14B-GGUF" args = [ "--gpulayers 80", "--contextsize 16384", "--threads 8", "--flashattention", ] use_case.primary = "completion" use_case.tags = ["interactive", "coding"] hardware.class = "gpu" hardware.gpu_count = 1 hardware.min_vram_gb = 24 hardware.max_vram_gb = 24 hardware.notes = "Tested on RTX 3090, CUDA 12.4, Ubuntu 24.04."
Running a model well is not a model problem — it's a configuration problem. You set context size, GPU layers, thread count, flash attention. You test. You tune. You land on args that work for your machine and your workload. Then the next person, with the same GPU and the same model, starts from scratch.
Configuration has solved instances. A profile captures one.
Each field has a job. backend declares the runtime. hardware.min_vram_gb and hardware.max_vram_gb
state the VRAM range. hardware.notes names the actual machine it was tested
on. use_case.tags describe the workload. The fields answer the one question
the catalog exists to answer: will this run well on your machine?
Every profile carries four fields designed to answer this.
Find profiles that match your runtime and hardware. The fit summary tells you in one line whether it's worth your time.
Every profile page gives you a one-liner. Copy it and run it in your terminal.
The imported profile shows up in your local llml. Select it with p and start running with the same args.