16-18 GB total memory (RAM + VRAM) for 4-bit.
16-18 GB total memory (RAM + VRAM) for 4-bit; llama.cpp supports CPU and GPU inference
llama.cpp
llml