Memory bandwidth

The rate at which data can be read from memory, expressed in GB/s. The primary bottleneck for LLM inference (not raw compute) — higher bandwidth means more tokens per second.