kimi-k2.6-eagle3-mla (continual fine-tune)
Eagle3 MTP draft model with MLA (Multi-Latent Attention) for accelerating inference of Kimi-K2.6.
This checkpoint is a continual fine-tune of lightseekorg/kimi-k2.6-eagle3-mla, further trained on an in-house instruction distribution. The architecture, config, and tensor layout are identical to the base model, so it is a drop-in replacement in the same vLLM serving path.
Training Setup
- Init: continual FT from
lightseekorg/kimi-k2.6-eagle3-mla. - Framework: Camelot online speculative-decoding training — concurrent FSDP training + vLLM rollout with cross-node hidden-state transfer over Mooncake (RDMA).
- Topology: 1 datagen node (Kimi-K2.6, TP=8) -> Mooncake -> 1 trainer GPU.
- Schedule: an initial 10k-step cosine phase (LR 2e-5), then a constant low-LR (2e-6) refinement phase continued from the best checkpoint. This published checkpoint is the best-val checkpoint of the refinement phase.
- Data: online-generated hidden states from a ~100k-prompt instruction set (each sample consumed once).
- seq len 4096, Eagle3 TTT steps 3, global batch size 1.
Validation (training-time, teacher-forced)
Per-position draft accuracy on a fixed held-out val split, measured during
training. acc@i is the accuracy at TTT position i (i = 0, 1, 2):
full_acc@i requires positions 0..i all correct; cond_acc@i is conditioned
on 0..i-1 correct. This is a training-time, teacher-forced metric on a
small online-sampled val split — it indicates per-position draft quality but is
not a runtime accept-length and is not comparable across runs/splits.
Best checkpoint of the refinement phase (this published checkpoint):
| metric | value |
|---|---|
| val_loss | 3.608 |
| full_acc@0 | 0.799 |
| full_acc@1 | 0.517 |
| full_acc@2 | 0.306 |
| cond_acc@1 | 0.648 |
| cond_acc@2 | 0.594 |
A runtime accept_length benchmark (vLLM 0.20, num_speculative_tokens=3) on a
common held-out set is pending and will be added once measured.
Quick Start (vLLM >= 0.20.0)
vllm serve moonshotai/Kimi-K2.6 \
--tensor-parallel-size 8 \
--speculative-config '{"model": "k-l-lambda/kimi-k2.6-eagle3-mla", "method": "eagle3", "num_speculative_tokens": 3}' \
--trust-remote-code
- Downloads last month
- 17
Model tree for k-l-lambda/kimi-k2.6-eagle3-mla
Base model
moonshotai/Kimi-K2.6