Title here
Summary here
Memory on a discrete GPU, used to store model weights and KV cache during inference. The hard ceiling for what models can run at full GPU speed. On discrete GPUs like the NVIDIA RTX 4080 Super (16GB) or AMD Radeon RX 7900 XTX (24GB), VRAM is separate from system RAM and fixed at the factory. Contrast with unified memory, where CPU and GPU share the same pool.