Chroma Context-1 β€” GGUF (llama.cpp)

GGUF weights for Chroma Context-1, converted for llama.cpp and any runtime that loads GGUF (LM Studio, Ollama with compatible import paths, local servers, etc.).

This repository exists because the upstream model is distributed in PyTorch / safetensors form only. These files are the same weights in GGUF, with a range of llama-quantize presets so you can trade quality for VRAM and disk.


Upstream (source of truth)

Link
Original weights & model card chromadb/context-1
Architecture family gpt-oss MoE (see upstream card; base traceable to OpenAI gpt-oss-20b)
License Apache 2.0 (unchanged; you must comply with upstream terms)

Attribution: All tensors are derived from chromadb/context-1. This repo is a community conversion and is not affiliated with or endorsed by Chroma. For behavior, safety, and intended use, read the official model card first.


Quick start

1. Install a recent llama.cpp build (or use a GUI that bundles it).

2. Download this repository:

huggingface-cli download ryancook/chromadb-context-1-gguf --local-dir ./chromadb-context-1-gguf

3. Run (example β€” adjust paths and context length to your hardware):

llama-cli -m ./chromadb-context-1-gguf/chromadb-context-1-Q4_0.gguf -cnv --color -ngl 99

Swap the filename for any published chromadb-context-1-*.gguf from the Files tab (for example Q4_K_M or MXFP4_MOE when available).


Choosing a file

Start here (good defaults for most people):

Priority File pattern When to use
1 …-Q4_K_M.gguf or …-Q5_K_M.gguf Best general-purpose balance of quality and size (if present in this repo).
2 …-MXFP4_MOE.gguf Smaller MoE-oriented layout; strong choice when supported by your llama.cpp build/GPU stack.
3 …-Q4_0.gguf / …-Q5_0.gguf Simpler legacy-style quants; predictable tradeoffs.
4 …-bf16.gguf Full BF16 fidelity (~40β€―GiB class); for reference or maximum quality when you have RAM/VRAM.

Other presets (IQ*, TQ*, Q2_K, Q3_K*, Q6_K, Q8_0, F16, …) may appear in the Files tab as they are published. Lower-bit and ternary formats are experimental for quality; profile on your workload before relying on them.

Tip: The Files and versions view on Hugging Face is authoritative for what is available in each commit. Filenames follow chromadb-context-1-<PRESET>.gguf.


Conversion pipeline

Reproducible high-level steps:

  1. Obtain weights from chromadb/context-1 (Apache 2.0).
  2. Convert to GGUF with llama.cpp convert_hf_to_gguf.py (BF16 output from upstream bf16 checkpoint).
  3. Quantize with llama-quantize using the preset named in each filename (Q4_0, Q4_K_M, MXFP4_MOE, etc.).

Reproducibility

Conversions for this collection were produced with ggml-org/llama.cpp at commit 07ba6d275 (short SHA; matches upstream convert_hf_to_gguf.py / llama-quantize from that tree). Newer llama.cpp revisions are generally backward compatible for GGUF loading, but you may see small numerical differences if you re-quantize.


Hardware & context

  • VRAM / RAM: MoE models route only a subset of experts per token; still treat published sizes as a guide and monitor peak usage at your target context length.
  • Context length: Upstream supports a very long context window; practical limits depend on KV cache size and quant. Start with a smaller -c / context setting and increase only after you confirm stability.

License

Same as upstream: Apache 2.0. Keep chromadb/context-1 attribution visible when you redistribute or ship products built on these files.


More from Chroma

<|tool▁calls▁begin|><|tool▁call▁begin|> Shell

Downloads last month
10,751
GGUF
Model size
21B params
Architecture
gpt-oss
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ryancook/chromadb-context-1-gguf

Quantized
(7)
this model