line-kite commited on
Commit
391e2f8
·
verified ·
1 Parent(s): ec7d640

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -3
README.md CHANGED
@@ -1,11 +1,122 @@
1
  ---
2
  license: apache-2.0
3
  language:
4
- - en
5
- - zh
 
 
 
 
 
 
6
  base_model:
7
  - Qwen/Qwen3-4B
 
 
8
  ---
9
 
10
- # GoT-R1-4B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
 
1
  ---
2
  license: apache-2.0
3
  language:
4
+ - en
5
+ tags:
6
+ - text-generation
7
+ - reinforcement-learning
8
+ - grpo
9
+ - graph-of-thought
10
+ - reasoning
11
+ - rlhf
12
  base_model:
13
  - Qwen/Qwen3-4B
14
+ metrics:
15
+ - accuracy
16
  ---
17
 
18
+ <h1 align="center"> GoT-R1: Internalizing Graph-of-Thought via Structural Reinforcement</h1>
19
+
20
+ <p align="center">
21
+ <b>High-Density Reasoning with Minimal Verbosity</b>
22
+ </p>
23
+
24
+ ## 📦 Model Collection
25
+
26
+ We release the GoT-R1 models across three parameter scales. You can access them here:
27
+
28
+ - [**GoT-R1-4B**](https://huggingface.co/MYTH-Lab/GoT-R1-4B)
29
+ - [**GoT-R1-8B**](https://huggingface.co/MYTH-Lab/GoT-R1-8B)
30
+ - [**GoT-R1-14B**](https://huggingface.co/MYTH-Lab/GoT-R1-14B)
31
+
32
+ ## 📖 Model Description
33
+
34
+ **GoT-R1** is a novel reasoning framework that fundamentally redefines how Large Language Models (LLMs) handle complex problem-solving. Developed jointly by Wuhan University and Shanghai Jiao Tong University, GoT-R1 shifts the paradigm from linear **Chain-of-Thought (CoT)** to an internalized **Graph-of-Thought (GoT)**.
35
+
36
+ While standard CoT models often fall into the "overthinking" trap—generating redundant narrative filler and suffering from cascading errors—GoT-R1 constructs a high-density structured reasoning graph internally. This structural reinforcement ensures that every reasoning node is an atomic logical primitive, enabling the model to solve complex logic tasks with unprecedented accuracy and minimal token usage.
37
+
38
+ - **Developer:** MYTH-Lab (Wuhan University & Shanghai Jiao Tong University)
39
+ - **Model Type:** Causal Language Model with RLHF (GRPO)
40
+ - **Architecture:** Transformer-based (4B, 8B, and 14B variants)
41
+ - **License:** Apache 2.0
42
+
43
+ ## 🏗️ Architecture & Methodology
44
+
45
+ ![GoT-R1 Architecture](got_r1_architecture.png)
46
+
47
+ 1. **High-Density Reasoning:** Decouples pure logic from conversational filler. The model learns to output graph topological steps rather than rambling paragraphs.
48
+ 2. **Elimination of Redundant Narration:** By strictly penalizing verbosity during RL training, GoT-R1 avoids the infinite "Wait... let me think" loops common in vanilla reasoning models.
49
+ 3. **Automated Structural Synthesis:** Trained on high-fidelity logical skeletons purified from teacher-model CoT traces without requiring expensive manual graph labeling.
50
+ 4. **Extreme Token Efficiency:** Achieves state-of-the-art accuracy using only **1.8%** of the token budget required by external search methods like Tree-of-Thought (ToT) (0.6M vs 33M tokens).
51
+
52
+ ## 📊 Evaluation Results
53
+
54
+ GoT-R1 sets a new benchmark for accuracy and efficiency across various scales. Notably, it drastically reduces logical inconsistencies and hallucinations, evidenced by an 18% improvement on TruthfulQA at the 8B scale.
55
+
56
+ | Model | GSM8K (ACC) | IFEval (I-Strict) | TruthfulQA | Winogrande |
57
+ | :--------------------------- | :---------- | :---------------- | :--------- | :--------- |
58
+ | Qwen3-4B | 93.78 | 87.04 | 66.71 | 76.01 |
59
+ | **GoT-R1-4B (Ours)** | **95.07** | **90.53** | **84.70** | **81.93** |
60
+ | Qwen3-8B | 94.62 | 90.46 | 74.42 | 80.58 |
61
+ | **GoT-R1-8B (Ours)** | **96.74** | **92.31** | **84.82** | **84.77** |
62
+ | Qwen3-14B | 96.59 | 91.26 | 77.72 | 86.19 |
63
+ | **GoT-R1-14B (Ours)** | **97.19** | **92.59** | **85.31** | **87.69** |
64
+
65
+ ## ⚙️ Training Procedure
66
+
67
+ The model was trained using a rigorously designed two-stage regimen:
68
+
69
+ 1. **Stage 1: Supervised Fine-Tuning (SFT):** The base model is pre-aligned to master the GoT syntax and structural formatting, learning to represent logic as discrete nodes.
70
+ 2. **Stage 2: GRPO Evolution:** We apply Group Relative Policy Optimization (GRPO) to reinforce topological integrity. The reward function is defined as:
71
+ \\(R_i = w_1 R_{task} + w_2 R_{graph} + w_3 R_{fmt} - w_4 P_{len}\\)
72
+
73
+ ## 💻 Quick Start
74
+
75
+ You can easily use GoT-R1 with the `transformers` library.
76
+
77
+ ```python
78
+ from transformers import AutoModelForCausalLM, AutoTokenizer
79
+
80
+ model_name = "MYTH-Lab/GoT-R1-8B" # Choose 4B, 8B, or 14B
81
+
82
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
83
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
84
+
85
+ prompt = "Carly collected 7 starfish with 5 arms each and one seastar with 14 arms. How many arms do the animals she collected have in total?"
86
+ messages = [
87
+ {"role": "user", "content": prompt}
88
+ ]
89
+
90
+ text = tokenizer.apply_chat_template(
91
+ messages,
92
+ tokenize=False,
93
+ add_generation_prompt=True
94
+ )
95
+
96
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
97
+ outputs = model.generate(
98
+ **inputs,
99
+ max_new_tokens=4096,
100
+ temperature=0.9
101
+ )
102
+
103
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
104
+ ```
105
+
106
+
107
+ ## 📚 Citation
108
+
109
+ If you find our work helpful, please cite our paper:
110
+
111
+ ```bibtex
112
+ @inproceedings{gotr1_2026,
113
+ title={GoT-R1: Internalizing Graph-of-Thought via Structural Reinforcement for High-Density Reasoning},
114
+ author={Li, Zuchao and Li, Qiwei and Yao, Yao and Zhao, Hai and Zhang, Lefei and Du, Bo},
115
+ booktitle={Findings of the Association for Computational Linguistics: ACL 2026},
116
+ year={2026},
117
+ note={To appear}
118
+ }
119
+ ```
120
+
121
+
122