4 GB total memory for 4-bit.
4 GB total memory for 4-bit; 5-8 GB for 8-bit; designed for phone/edge inference; supports text, image, and audio; llama.cpp supports CPU and GPU inference
llama.cpp >= current llml-supported versionllml supplies the path at launch