Q6_K

← Back to Glossary

6-bit K-quant. Higher quality than Q4, uses ~50% more memory. Generally considered near-lossless for most tasks. The recommended quantization for 70B models on hardware with 96GB GPU memory.

llama-quant.cpp

Referenced in

BPW (bits per weight)

Q4_K_M

Q8_0

Codex

Matt Oswalt

Title here

Q6_K

Referenced in