llama.cpp
python3 -m pip install -r requirements.txt
./llama-quantize ./models/sarvam/sarvam-2B-v0.5-F16.gguf ./models/sarvam/sarvam-2B-v0.5-Q4_K_M.gguf Q4_K_M
Reference - https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md - https://developer.nvidia.com/cuda-downloads - https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/README.md