Chroma Context-1 — GGUF (llama.cpp)

GGUF weights for Chroma Context-1, converted for llama.cpp and any runtime that loads GGUF (LM Studio, Ollama with compatible import paths, local servers, etc.).

This repository exists because the upstream model is distributed in PyTorch / safetensors form only. These files are the same weights in GGUF, with a range of llama-quantize presets so you can trade quality for VRAM and disk.

Upstream (source of truth)

	Link
Original weights & model card	`chromadb/context-1`
Architecture family	gpt-oss MoE (see upstream card; base traceable to OpenAI `gpt-oss-20b`)
License	Apache 2.0 (unchanged; you must comply with upstream terms)

Attribution: All tensors are derived from chromadb/context-1. This repo is a community conversion and is not affiliated with or endorsed by Chroma. For behavior, safety, and intended use, read the official model card first.

Quick start

1. Install a recent llama.cpp build (or use a GUI that bundles it).

2. Download this repository:

huggingface-cli download ryancook/chromadb-context-1-gguf --local-dir ./chromadb-context-1-gguf

3. Run (example — adjust paths and context length to your hardware):

llama-cli -m ./chromadb-context-1-gguf/chromadb-context-1-Q4_0.gguf -cnv --color -ngl 99

Swap the filename for any published chromadb-context-1-*.gguf from the Files tab (for example Q4_K_M or MXFP4_MOE when available).

Choosing a file

Start here (good defaults for most people):

Priority	File pattern	When to use
1	`…-Q4_K_M.gguf` or `…-Q5_K_M.gguf`	Best general-purpose balance of quality and size (if present in this repo).
2	`…-MXFP4_MOE.gguf`	Smaller MoE-oriented layout; strong choice when supported by your llama.cpp build/GPU stack.
3	`…-Q4_0.gguf` / `…-Q5_0.gguf`	Simpler legacy-style quants; predictable tradeoffs.
4	`…-bf16.gguf`	Full BF16 fidelity (~40 GiB class); for reference or maximum quality when you have RAM/VRAM.

Other presets (IQ*, TQ*, Q2_K, Q3_K*, Q6_K, Q8_0, F16, …) may appear in the Files tab as they are published. Lower-bit and ternary formats are experimental for quality; profile on your workload before relying on them.

Tip: The Files and versions view on Hugging Face is authoritative for what is available in each commit. Filenames follow chromadb-context-1-<PRESET>.gguf.

Conversion pipeline

Reproducible high-level steps:

Obtain weights from chromadb/context-1 (Apache 2.0).
Convert to GGUF with llama.cpp convert_hf_to_gguf.py (BF16 output from upstream bf16 checkpoint).
Quantize with llama-quantize using the preset named in each filename (Q4_0, Q4_K_M, MXFP4_MOE, etc.).

Reproducibility

Conversions for this collection were produced with ggml-org/llama.cpp at commit 07ba6d275 (short SHA; matches upstream convert_hf_to_gguf.py / llama-quantize from that tree). Newer llama.cpp revisions are generally backward compatible for GGUF loading, but you may see small numerical differences if you re-quantize.

Hardware & context

VRAM / RAM: MoE models route only a subset of experts per token; still treat published sizes as a guide and monitor peak usage at your target context length.
Context length: Upstream supports a very long context window; practical limits depend on KV cache size and quant. Start with a smaller -c / context setting and increase only after you confirm stability.

License

Same as upstream: Apache 2.0. Keep chromadb/context-1 attribution visible when you redistribute or ship products built on these files.

Model tree for ryancook/chromadb-context-1-gguf

Base model

openai/gpt-oss-20b

Finetuned

chromadb/context-1

Quantized

(7)

this model

ryancook
/

chromadb-context-1-gguf