Add link to the paper on 🤗
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,12 +1,13 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
base_model:
|
4 |
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
5 |
-
pipeline_tag: text-generation
|
6 |
-
library_name: transformers
|
7 |
datasets:
|
8 |
- Zigeng/CoT-Veirification-340k
|
|
|
|
|
|
|
9 |
---
|
|
|
10 |
<div align="center">
|
11 |
<h1>🔍 VeriThinker: Learning to Verify Makes Reasoning Model Efficient</h1>
|
12 |
</div>
|
@@ -46,6 +47,10 @@ datasets:
|
|
46 |
<td>📊 <strong>Data</strong></td>
|
47 |
<td><a href="https://huggingface.co/datasets/Zigeng/CoT-Veirification-340k">
|
48 |
CoT-Veirification-340k</a></td>
|
|
|
|
|
|
|
|
|
49 |
</tr>
|
50 |
</tbody>
|
51 |
</table>
|
@@ -115,7 +120,10 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
115 |
)
|
116 |
|
117 |
# prepare the model input
|
118 |
-
prompt_part_1 = "## Instruction
|
|
|
|
|
|
|
119 |
|
120 |
|
121 |
prompt_part_2 = """## Question:
|
@@ -165,4 +173,50 @@ generated_ids = [
|
|
165 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
166 |
|
167 |
print(response)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
168 |
```
|
|
|
1 |
---
|
|
|
2 |
base_model:
|
3 |
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
|
|
|
|
4 |
datasets:
|
5 |
- Zigeng/CoT-Veirification-340k
|
6 |
+
library_name: transformers
|
7 |
+
license: mit
|
8 |
+
pipeline_tag: text-generation
|
9 |
---
|
10 |
+
|
11 |
<div align="center">
|
12 |
<h1>🔍 VeriThinker: Learning to Verify Makes Reasoning Model Efficient</h1>
|
13 |
</div>
|
|
|
47 |
<td>📊 <strong>Data</strong></td>
|
48 |
<td><a href="https://huggingface.co/datasets/Zigeng/CoT-Veirification-340k">
|
49 |
CoT-Veirification-340k</a></td>
|
50 |
+
</tr>
|
51 |
+
<tr>
|
52 |
+
<td>📄 <strong>Paper (🤗)</strong></td>
|
53 |
+
<td><a href="https://huggingface.co/papers/2505.17941">Hugging Face Paper</a></td>
|
54 |
</tr>
|
55 |
</tbody>
|
56 |
</table>
|
|
|
120 |
)
|
121 |
|
122 |
# prepare the model input
|
123 |
+
prompt_part_1 = "## Instruction:
|
124 |
+
You will be provided with a question along with a proposed solution. Please carefully verify each step of the solution, tell me if every step is absolutely correct.
|
125 |
+
|
126 |
+
"
|
127 |
|
128 |
|
129 |
prompt_part_2 = """## Question:
|
|
|
173 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
174 |
|
175 |
print(response)
|
176 |
+
```
|
177 |
+
|
178 |
+
## 🔥 Training
|
179 |
+
### 1. Training with LoRA:
|
180 |
+
We provide training scripts for our proposed supervised verification fine-tuning approach. The implementation utilizes LoRA during the training process, with the configuration details specified in [config_lora_r1_7b.yaml](https://github.com/czg1225/VeriThinker/blob/main/config/config_lora_r1_7b.yaml).
|
181 |
+
```bash
|
182 |
+
deepspeed --include localhost:0,1,2,3,4,5,6,7 train_svft.py
|
183 |
+
```
|
184 |
+
|
185 |
+
### 2. LoRA Merge:
|
186 |
+
After training, merge the LoRA weights to get the reasoning model.
|
187 |
+
```bash
|
188 |
+
python merge_lora.py
|
189 |
+
```
|
190 |
+
|
191 |
+
## ⚡ Evaluation:
|
192 |
+
We provide evaluation scripts for three mathematical datasets: MATH500, AIME 2024, and AIME 2025. Our implementation leverages the [vLLM](https://docs.vllm.ai/en/latest/) framework to ensure efficient inference during evaluation.
|
193 |
+
|
194 |
+
### 1. Evaluation on MATH500 Dataset
|
195 |
+
```bash
|
196 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 python eval_math500.py
|
197 |
+
```
|
198 |
+
### 2. Evaluation on AIME 2024 Dataset
|
199 |
+
```bash
|
200 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 python eval_aime24.py
|
201 |
+
```
|
202 |
+
### 3. Evaluation on AIME 2025 Dataset
|
203 |
+
```bash
|
204 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 python eval_aime25.py
|
205 |
+
```
|
206 |
+
|
207 |
+
## 📖 Experimental Results
|
208 |
+
### CoT Compression Results:
|
209 |
+

|
210 |
+
|
211 |
+
### CoT Correctness Verification Results:
|
212 |
+

|
213 |
+
|
214 |
+
### Speculative Reasoning Results:
|
215 |
+
Speculative reasoning results on three reasoning models. When using Qwen-2.5-Math-Instruct-7B as the draft model, most problems in MATH500 and GSM8K can be solved with short CoT model, while only a few (around 10%) require activation of the long CoT model for more complex solutions.
|
216 |
+

|
217 |
+

|
218 |
+
|
219 |
+
## Citation
|
220 |
+
If our research assists your work, please give us a star ⭐ or cite us using:
|
221 |
+
```
|
222 |
```
|