SMOLM2Prover - GGUF Format
GGUF quantized version of the SMOLM2Prover model for use with llama.cpp and compatible runtimes.
Model Details
- Original Model: reaperdoesntknow/SMOLM2Prover
- Architecture: LlamaForCausalLM
- Context Length: 8192 tokens
- Embedding Dimension: 960
- Layers: 32
- Head Count: 15 (Q), 5 (KV) - GQA
Available Files
| File | Size | Quantization | Quality |
|---|---|---|---|
SMOLM2Prover.gguf |
692M | F16 | Original (no quantization) |
SMOLM2Prover-Q4_K_M.gguf |
258M | Q4_K_M | Recommended (good quality/size balance) |
Usage
With llama.cpp
# Run with the quantized model
./llama-cli -m SMOLM2Prover-Q4_K_M.gguf -p "Your prompt here" -n 256
With Ollama
Create a Modelfile:
FROM ./SMOLM2Prover-Q4_K_M.gguf
Then:
ollama create smolm2prover -f Modelfile
ollama run smolm2prover
With LM Studio
- Download
SMOLM2Prover-Q4_K_M.gguf - Place in LM Studio models folder
- Load and chat!
Quantization Details
The Q4_K_M quantization uses:
- Q4_K for most weights
- Q5_0 fallback for tensors not divisible by 256
- Q6_K/Q8_0 for some critical layers
Size reduction: 692M → 258M (63% smaller) BPW: 5.94 bits per weight
Discrepancy Calculus Foundation
This model is part of the Convergent Intelligence LLC: Research Division portfolio. All models in this portfolio are developed under the Discrepancy Calculus (DISC) framework — a measure-theoretic approach to understanding and controlling the gap between what a model should produce and what it actually produces.
DISC treats training singularities (loss plateaus, mode collapse, catastrophic forgetting) not as failures to be smoothed over, but as structural signals that reveal the geometry of the learning problem. Key concepts:
- Discrepancy Operator (D): Measures the gap between expected and observed behavior at each training step
- Jump Sets: Boundaries where model behavior changes discontinuously — these are features, not bugs
- Ghost Imprinting: Teacher knowledge that transfers to student models through weight-space topology rather than explicit distillation signal
For the full mathematical treatment, see Discrepancy Calculus: Foundations and Core Theory (DOI: 10.57967/hf/8194).
Citation chain: Structure Over Scale (DOI: 10.57967/hf/8165) → Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) → Discrepancy Calculus (DOI: 10.57967/hf/8194)
License
Same as the original model.
Convergent Intelligence Portfolio
Part of the Standalone Models by Convergent Intelligence LLC: Research Division
Related Models
| Model | Downloads | Format |
|---|---|---|
| SMOLM2Prover | 56 | HF |
| DeepReasoning_1R | 16 | HF |
| SAGI | 3 | HF |
| S-AGI | 0 | HF |
Top Models from Our Lab
Total Portfolio: 41 models | 2,781 total downloads
Last updated: 2026-03-28 12:55 UTC
From the Convergent Intelligence Portfolio
DistilQwen Collection — Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B → 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.
Top model: Qwen3-1.7B-Coder-Distilled-SFT — 508 downloads
Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)
Convergent Intelligence LLC: Research Division
Part of the reaperdoesntknow research portfolio — 48 models, 12,094 total downloads | Last refreshed: 2026-03-29 21:05 UTC
- Downloads last month
- 653
4-bit
Model tree for reaperdoesntknow/SMOLM2Prover-GGUF
Base model
HuggingFaceTB/SmolLM2-360M