Quantization

The process of reducing the numerical precision of model weights to decrease memory footprint and increase inference speed, at some cost to output quality. A full-precision (FP16) 70B model would require ~140GB; quantization brings this down to a manageable size.