Title here
Summary here
A smaller model trained to mimic the outputs of a larger, more capable model. Often retains much of the larger model’s reasoning ability at a fraction of the size and memory cost. Examples: DeepSeek R1 Distill Qwen 32B, DeepSeek R1 Distill Llama 70B — both distilled from the full 671B DeepSeek R1.