Loading search index…
No recent searches
No results for "Query here"
← Back to Glossary
The rate at which data can be read from memory, expressed in GB/s. The primary bottleneck for LLM inference (not raw compute) — higher bandwidth means more tokens per second.