Improve model card: add library name and pipeline tag, link to code
Browse filesThis PR improves the model card by adding the library name and pipeline tag, ensuring the model can be found at https://huggingface.co/models?pipeline_tag=text-generation. It also adds a link to the Github repository.
README.md
CHANGED
@@ -1,3 +1,18 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
library_name: transformers
|
4 |
+
pipeline_tag: text-generation
|
5 |
+
---
|
6 |
+
|
7 |
+
# R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
|
8 |
+
|
9 |
+
The model was presented in the paper [R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning](https://huggingface.co/papers/2505.21668).
|
10 |
+
|
11 |
+
Our code is based on [Llama-factory](https://github.com/hiyouga/LLaMA-Factory)/[VeRL](https://github.com/volcengine/verl)/[Search-R1](https://github.com/PeterGriffinJin/Search-R1?tab=readme-ov-file) for the SFT and RL training and [SymBench](https://github.com/yongchao98/CodeSteer-v1.0/tree/main)/[BIG-Bench-Hard](https://github.com/yongchao98/R1-Code-Interpreter/tree/main)/[reasoning-gym](https://github.com/open-thought/reasoning-gym) for datasets/benchmarks of reasoning/planning tasks.
|
12 |
+
|
13 |
+
## 📝 Introduction
|
14 |
+
R1-Code-Interpreter is the first framework to train LLMs for step-by-step code reasoning using multi-turn supervised fine-tuning and reinforcement learning. By curating 144 diverse reasoning and planning tasks, we enable Qwen-2.5 models (3B/7B/14B) to autonomously decide when and how to invoke code. Our best model, R1-CI-14B, outperforms GPT-4o (text-only) and approaches GPT-4o with Code Interpreter, showing emergent self-checking behavior via code generation.
|
15 |
+
|
16 |
+
[Github repository](https://github.com/yongchao98/R1-Code-Interpreter)
|
17 |
+
|
18 |
+
Project page: https://huggingface.co/yongchao98
|