Matt Oswalt
Codex
  • Linux
    • File Descriptors
    • Networking
      • eBPF
      • Sockets
  • LLM
    • Resources
    • Inference Stack
    • Apps & Libraries
    • Model Evaluation
    • Memory
    • Glossary
  • Machine Learning
    • Deep Learning
    • Machine Learning
    • Glossary
  • Math
    • Glossary
  • Rust
    • Common Traits
    • Ownership
  • Video
    • GoPro
  • Cheat Sheets
Matt Oswalt
  • Blogs
    • All Categories
    • Rust
    • General Programming
    • Systems
    • Machine Learning
    • Personal
  • Codex
  • Bookclub
  • Portfolio
  • Sponsor Me!
  • Github
  • Twitter
  • Twitch
  • LinkedIn
  • YouTube
  • Facebook
  • Bluesky
  • RSS

Search

Loading search index…

No recent searches

No results for "Query here"

  • to select
  • to navigate
  • to close

Search by FlexSearch

  • Linux
    • File Descriptors
    • Networking
      • eBPF
      • Sockets
  • LLM
    • Resources
    • Inference Stack
    • Apps & Libraries
    • Model Evaluation
    • Memory
    • Glossary
  • Machine Learning
    • Deep Learning
    • Machine Learning
    • Glossary
  • Math
    • Glossary
  • Rust
    • Common Traits
    • Ownership
  • Video
    • GoPro
  • Cheat Sheets

This Glossary

  • Context window
  • Dense model
  • Distilled model
  • Instruct model
  • KV cache
  • MoE (Mixture of Experts)
  • Open-weight model
  • Parameter
  • Reasoning model
  • RLHF (Reinforcement Learning from Human Feedback)
  • t/s (tokens per second)
  • Token

t/s (tokens per second)

← Back to Glossary

The standard metric for LLM inference speed. Typically reported as generation speed (how fast new tokens are produced). Prompt processing (prefill) speed is usually much higher but measured separately.

  • llama-bench README
Referenced in
  • File Descriptors
  • Memory bandwidth
  • Token
  • Memory
Prev
Sigmoid
Next
tanh
    • © 2010 - 2026 Matt Oswalt · Powered by Hugo & Hyas.