kimi-k2.6-eagle3-mla (continual fine-tune)

Eagle3 MTP draft model with MLA (Multi-Latent Attention) for accelerating inference of Kimi-K2.6.

This checkpoint is a continual fine-tune of lightseekorg/kimi-k2.6-eagle3-mla, further trained on an in-house instruction distribution. The architecture, config, and tensor layout are identical to the base model, so it is a drop-in replacement in the same vLLM serving path.

Training Setup

  • Init: continual FT from lightseekorg/kimi-k2.6-eagle3-mla.
  • Framework: Camelot online speculative-decoding training — concurrent FSDP training + vLLM rollout with cross-node hidden-state transfer over Mooncake (RDMA).
  • Topology: 1 datagen node (Kimi-K2.6, TP=8) -> Mooncake -> 1 trainer GPU.
  • Schedule: an initial 10k-step cosine phase (LR 2e-5), then a constant low-LR (2e-6) refinement phase continued from the best checkpoint. This published checkpoint is the best-val checkpoint of the refinement phase.
  • Data: online-generated hidden states from a ~100k-prompt instruction set (each sample consumed once).
  • seq len 4096, Eagle3 TTT steps 3, global batch size 1.

Validation (training-time, teacher-forced)

Per-position draft accuracy on a fixed held-out val split, measured during training. acc@i is the accuracy at TTT position i (i = 0, 1, 2): full_acc@i requires positions 0..i all correct; cond_acc@i is conditioned on 0..i-1 correct. This is a training-time, teacher-forced metric on a small online-sampled val split — it indicates per-position draft quality but is not a runtime accept-length and is not comparable across runs/splits.

Best checkpoint of the refinement phase (this published checkpoint):

metric value
val_loss 3.608
full_acc@0 0.799
full_acc@1 0.517
full_acc@2 0.306
cond_acc@1 0.648
cond_acc@2 0.594

A runtime accept_length benchmark (vLLM 0.20, num_speculative_tokens=3) on a common held-out set is pending and will be added once measured.

Quick Start (vLLM >= 0.20.0)

vllm serve moonshotai/Kimi-K2.6 \
    --tensor-parallel-size 8 \
    --speculative-config '{"model": "k-l-lambda/kimi-k2.6-eagle3-mla", "method": "eagle3", "num_speculative_tokens": 3}' \
    --trust-remote-code
Downloads last month
17
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for k-l-lambda/kimi-k2.6-eagle3-mla

Finetuned
(1)
this model