IQ (importance-matrix quantization)

← Back to Glossary

A family of quantizations (IQ3_M, IQ4_NL, etc.) that use a calibration dataset to identify which weights matter most, applying higher precision selectively. Often better quality than K-quants at the same bit depth, especially at lower bit levels (2–3 bit).

Importance Matrix Quantization (llama.cpp PR)

Referenced in

Model Evaluation

Instruct model

KV cache

Codex

Matt Oswalt

Title here

IQ (importance-matrix quantization)

Referenced in