Kimi-K2.7-Code Eagle3-MLA Draft

Eagle3-MLA speculative-decoding draft model for Kimi-K2.7-Code, trained natively on K2.7-Code data. Pairs with the Kimi-K2.7-Code verifier under vLLM speculative decoding.

What this is

Algorithm: EAGLE-3 with MLA (multi-head latent attention), single draft decoder layer.
Verifier: Kimi-K2.7-Code (DeepSeek-V3-class architecture; arch is identical across K2.5 / K2.6 / K2.7). The draft reuses the verifier's frozen embedding / lm_head / norm.
Training data: real K2.7-Code serving traffic (agentic / coding / tool, oversampled 5x) mixed with kimi-mtp prompts re-answered by K2.7-Code.
Recipe: ttt_steps=4, ttt_step_loss_decay=1.0, off-policy tokens, l2sp_lambda=1e-4, cosine LR 2e-5, seq_length 8192, max_steps 120000.

Evaluation

Final checkpoint, speculative-decoding eval against the Kimi-K2.7-Code verifier (vLLM 0.20.0, TP=8, num_speculative_tokens=3, c=4, greedy). Mean accepted-token length:

Draft	Real K2.7-Code traffic	K2.6-distribution held-out
This model (final)	2.345	2.246

Usage (vLLM)

vllm serve /path/to/Kimi-K2.7-Code \
  --tensor-parallel-size 8 \
  --speculative-config '{"model": "k-l-lambda/kimi-k2.7-code-eagle3-mla", "num_speculative_tokens": 3, "method": "eagle3"}'

Checkpoint

Final checkpoint of the K2.7-native run (step 118800; val_loss had plateaued, so the run was stopped just short of the 120000 budget). Best by validation full-sequence accept rate among retained checkpoints, and the eval winner on real K2.7 traffic above.

Downloads last month: 174

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for k-l-lambda/kimi-k2.7-code-eagle3-mla

Base model

moonshotai/Kimi-K2.7-Code

Finetuned

(4)

this model