Zigeng
/

R1-VeriThinker-7B

@@ -1,12 +1,13 @@
 ---
-license: mit
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
-pipeline_tag: text-generation
-library_name: transformers
 datasets:
 - Zigeng/CoT-Veirification-340k
 ---
 <div align="center">
 <h1>🔍 VeriThinker: Learning to Verify Makes Reasoning Model Efficient</h1>
 </div>
@@ -46,6 +47,10 @@ datasets:
       <td>📊 <strong>Data</strong></td>
       <td><a href="https://huggingface.co/datasets/Zigeng/CoT-Veirification-340k">
 CoT-Veirification-340k</a></td>
     </tr>
   </tbody>
 </table>
@@ -115,7 +120,10 @@ model = AutoModelForCausalLM.from_pretrained(
 )
 # prepare the model input
-prompt_part_1 = "## Instruction:\nYou will be provided with a question along with a proposed solution. Please carefully verify each step of the solution, tell me if every step is absolutely correct.\n\n"
 prompt_part_2 = """## Question:
@@ -165,4 +173,50 @@ generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
 ```

 ---
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
 datasets:
 - Zigeng/CoT-Veirification-340k
+library_name: transformers
+license: mit
+pipeline_tag: text-generation
 ---
 <div align="center">
 <h1>🔍 VeriThinker: Learning to Verify Makes Reasoning Model Efficient</h1>
 </div>
       <td>📊 <strong>Data</strong></td>
       <td><a href="https://huggingface.co/datasets/Zigeng/CoT-Veirification-340k">
 CoT-Veirification-340k</a></td>
+    </tr>
+        <tr>
+      <td>📄 <strong>Paper (🤗)</strong></td>
+      <td><a href="https://huggingface.co/papers/2505.17941">Hugging Face Paper</a></td>
     </tr>
   </tbody>
 </table>
 )
 # prepare the model input
+prompt_part_1 = "## Instruction:
+You will be provided with a question along with a proposed solution. Please carefully verify each step of the solution, tell me if every step is absolutely correct.
+"
 prompt_part_2 = """## Question:
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
+```
+## 🔥 Training
+### 1. Training with LoRA:
+We provide training scripts for our proposed supervised verification fine-tuning approach. The implementation utilizes LoRA during the training process, with the configuration details specified in [config_lora_r1_7b.yaml](https://github.com/czg1225/VeriThinker/blob/main/config/config_lora_r1_7b.yaml).
+```bash
+deepspeed --include localhost:0,1,2,3,4,5,6,7 train_svft.py
+```
+### 2. LoRA Merge:
+After training, merge the LoRA weights to get the reasoning model.
+```bash
+python merge_lora.py
+```
+## ⚡ Evaluation:
+We provide evaluation scripts for three mathematical datasets: MATH500, AIME 2024, and AIME 2025. Our implementation leverages the [vLLM](https://docs.vllm.ai/en/latest/) framework to ensure efficient inference during evaluation.
+### 1. Evaluation on MATH500 Dataset
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 python eval_math500.py
+```
+### 2. Evaluation on AIME 2024 Dataset
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 python eval_aime24.py
+```
+### 3. Evaluation on AIME 2025 Dataset
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 python eval_aime25.py
+```
+## 📖 Experimental Results
+### CoT Compression Results:
+![CoT Compression](assets/cot-compression.png)
+### CoT Correctness Verification Results:
+![CoT Correctness](assets/cot-correctness.png)
+### Speculative Reasoning Results:
+Speculative reasoning results on three reasoning models. When using Qwen-2.5-Math-Instruct-7B as the draft model, most problems in MATH500 and GSM8K can be solved with short CoT model, while only a few (around 10%) require activation of the long CoT model for more complex solutions.
+![CoT Speculative1](assets/cot-spec1.png)
+![CoT Speculative2](assets/cot-spec2.png)
+## Citation
+If our research assists your work, please give us a star ⭐ or cite us using:
+```
 ```