16-18 GB total memory (RAM + VRAM) for 4-bit.
16-18 GB total memory (RAM + VRAM) for 4-bit; requires mmproj BF16 file for vision; llama-server OpenAI-compatible endpoint on port 8001
llama.cpp >= current llml-supported versionllml supplies the path at launch