t/s (tokens per second)

← Back to Glossary

The standard metric for LLM inference speed. Typically reported as generation speed (how fast new tokens are produced). Prompt processing (prefill) speed is usually much higher but measured separately.

llama-bench README

Referenced in

File Descriptors
Memory bandwidth
Token
Memory

Sigmoid

tanh

Codex

Matt Oswalt

Title here

t/s (tokens per second)

Referenced in