16-18 GB total memory (RAM + VRAM) for 4-bit.
16-18 GB total memory (RAM + VRAM) for 4-bit; requires mmproj BF16 file for vision; llama.cpp supports CPU and GPU inference
llama.cpp
llml