Llama-4-Maverick-17B-128E-Instruct โ MINT GGUF
Q8_0 (8.5 bits/weight) | 397 GB | Ready for Ollama, llama.cpp, LM Studio
Quick Start
Important: You must use the included
Modelfilewhen running with Ollama. It contains the chat template and stop tokens needed for proper generation. Without it, the model will repeat output endlessly.
# Ollama โ download the GGUF and Modelfile, then create:
ollama create llama4-maverick -f Modelfile
# Run:
ollama run llama4-maverick
# llama.cpp (stop tokens handled automatically)
llama-cli -m Llama-4-Maverick-Q8_0.gguf -p "Hello" -ngl 99
Model Details
| Spec | Value |
|---|---|
| Base model | meta-llama/Llama-4-Maverick-17B-128E-Instruct |
| Parameters | 402B total, 17B active (MoE, 128 experts) |
| Architecture | Llama 4 MoE (24 dense + 24 MoE layers) |
| Quantization | Q8_0 (8.5 BPW) |
| Size | 397 GB |
Quality vs Size Analysis
This model sits at the knee of the diminishing returns curve โ the best quality-per-GB tradeoff. Beyond ~407 GB, each additional GB yields minimal quality improvement.
Modelfile
A Modelfile is included with proper Llama 4 chat template and stop tokens (<|eot|>, <|eom|>).
FROM Llama-4-Maverick-Q8_0.gguf
PARAMETER num_ctx 32768
PARAMETER temperature 0.6
PARAMETER stop "<|eot|>"
PARAMETER stop "<|eom|>"
About MINT
MINT (Memory-Informed N-bit Tuning) is a data-free, per-tensor mixed-precision quantization method.
Paper & Code: github.com/baa-ai/MINT
Rate-Distortion Curve
Quality vs size trade-off from MINT MCKP allocator. โ = optimal knee point.
| Budget | Size | Avg Bits | Loss | |
|---|---|---|---|---|
| 167 GB | 166.7 GB | 3.1 | 125.0816 | |
| 200 GB | 200.0 GB | 3.9 | 6.9136 | โ |
| 233 GB | 233.3 GB | 4.6 | 4.2019 | |
| 267 GB | 266.6 GB | 5.2 | 2.5834 | |
| 300 GB | 299.9 GB | 5.9 | 1.6647 | |
| 333 GB | 333.3 GB | 6.7 | 0.8914 | |
| 367 GB | 366.6 GB | 7.4 | 0.5404 | |
| 400 GB | 399.9 GB | 8.3 | 0.2990 | |
| 433 GB | 429.3 GB | 8.7 | 0.2404 | |
| 467 GB | 462.1 GB | 9.5 | 0.2030 | |
| 500 GB | 499.6 GB | 10.3 | 0.1607 | |
| 533 GB | 532.4 GB | 11.1 | 0.1237 | |
| 567 GB | 560.5 GB | 11.7 | 0.1066 | |
| 600 GB | 598.0 GB | 12.6 | 0.0851 | |
| 633 GB | 626.1 GB | 13.2 | 0.0691 | |
| 667 GB | 663.6 GB | 14.1 | 0.0478 | |
| 700 GB | 691.8 GB | 14.7 | 0.0318 | |
| 733 GB | 729.3 GB | 15.6 | 0.0106 |
Generated by MINT rate-distortion optimization.
- Downloads last month
- 43
8-bit
Model tree for baa-ai/Llama-4-Maverick-17B-128E-Instruct-MINT-407GB-GGUF
Base model
meta-llama/Llama-4-Maverick-17B-128E
