Memory bandwidth

← Back to Glossary

The rate at which data can be read from memory, expressed in GB/s. The primary bottleneck for LLM inference (not raw compute) — higher bandwidth means more tokens per second.

Memory Bandwidth - Wikipedia

Referenced in

MoE (Mixture of Experts)
Memory

Machine Learning

MoE (Mixture of Experts)

Codex

Matt Oswalt

Title here

Memory bandwidth

Referenced in