Title here
Summary here
A model architecture where all parameters are used for every inference pass. Most straightforward to understand and run. Contrasted with MoE models, which only activate a subset of parameters per token. Examples: Llama 3.3 70B, Qwen2.5 72B, Gemma 3 27B.