Distilled model

← Back to Glossary

A smaller model trained to mimic the outputs of a larger, more capable model. Often retains much of the larger model’s reasoning ability at a fraction of the size and memory cost. Examples: DeepSeek R1 Distill Qwen 32B, DeepSeek R1 Distill Llama 70B — both distilled from the full 671B DeepSeek R1.

Knowledge Distillation (arXiv)

Dense model

eBPF

Codex

Matt Oswalt

Title here

Distilled model