GGUF

← Back to Glossary

The file format used by llama.cpp to store quantized model weights, tokenizer data, and metadata in a single file. The standard format for local LLM inference. Replaced the older GGML format.

GGUF Format Documentation

Referenced in

BPW (bits per weight)
Inference Stack
Model Evaluation

File Descriptors

Instruct model

Codex

Matt Oswalt

Title here

GGUF

Referenced in