Update README.md
Browse files
README.md
CHANGED
@@ -30,6 +30,7 @@ OpenR1-Distill-7B replicates the reasoning capabilities of [deepseek-ai/DeepSeek
|
|
30 |
|
31 |
- **Repository:** https://github.com/huggingface/open-r1
|
32 |
- **Training logs:** https://wandb.ai/huggingface/open-r1/runs/199cum6l
|
|
|
33 |
|
34 |
## Usage
|
35 |
|
@@ -93,10 +94,10 @@ OpenR1-Distill-7B was trained using supervised fine-tuning (SFT) on the [Mixture
|
|
93 |
|
94 |
The individual experiments correspond to the following:
|
95 |
|
96 |
-
* exp1 - exp3
|
97 |
-
* exp4 - exp6
|
98 |
-
* exp7 - exp8
|
99 |
-
* exp9 - exp10
|
100 |
|
101 |
> [!NOTE]
|
102 |
> We use LiveCodeBench v4 to accelerate evaluation during our ablations as it contains around half the problems of v5, yet is still representative of the full benchmark.
|
|
|
30 |
|
31 |
- **Repository:** https://github.com/huggingface/open-r1
|
32 |
- **Training logs:** https://wandb.ai/huggingface/open-r1/runs/199cum6l
|
33 |
+
- **Evaluation logs:** https://huggingface.co/datasets/open-r1/details-open-r1_OpenR1-Distill-7B
|
34 |
|
35 |
## Usage
|
36 |
|
|
|
94 |
|
95 |
The individual experiments correspond to the following:
|
96 |
|
97 |
+
* **exp1 - exp3:** extending the model's base RoPE frequency from 10k to 100k, 300k, and 500k respectively. We find there is no significant difference between the scaling factors, and used 300k in all subsequent experiments.
|
98 |
+
* **exp4 - exp6:** independently scaling the learning rate on the math and code mixtures from 1e-5 to 2e-5, and 4e-5 respectively.
|
99 |
+
* **exp7 - exp8:** measuring the impact of sequence packing (exp7) versus no packing (exp8) on the math mixture.
|
100 |
+
* **exp9 - exp10:** measuring the impact of training on all three mixtures (math, code, and science) versus training on math and code only.
|
101 |
|
102 |
> [!NOTE]
|
103 |
> We use LiveCodeBench v4 to accelerate evaluation during our ablations as it contains around half the problems of v5, yet is still representative of the full benchmark.
|