lewtun HF Staff commited on
Commit
6104419
·
verified ·
1 Parent(s): 1a5f577

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -30,6 +30,7 @@ OpenR1-Distill-7B replicates the reasoning capabilities of [deepseek-ai/DeepSeek
30
 
31
  - **Repository:** https://github.com/huggingface/open-r1
32
  - **Training logs:** https://wandb.ai/huggingface/open-r1/runs/199cum6l
 
33
 
34
  ## Usage
35
 
@@ -93,10 +94,10 @@ OpenR1-Distill-7B was trained using supervised fine-tuning (SFT) on the [Mixture
93
 
94
  The individual experiments correspond to the following:
95
 
96
- * exp1 - exp3: extending the model's base RoPE frequency from 10k to 100k, 300k, and 500k respectively. We find there is no significant difference between the scaling factors, and choose 300k in all subsequent experiments.
97
- * exp4 - exp6: independently scaling the learning rate on the math and code mixtures from 1e-5 to 2e-5, and 4e-5 respectively.
98
- * exp7 - exp8: measuring the impact of sequence packing (exp7) versus no packing (exp8) on the math mixture.
99
- * exp9 - exp10: measuring the impact of training on all three mixtures (math, code, and science) versus training on math and code only.
100
 
101
  > [!NOTE]
102
  > We use LiveCodeBench v4 to accelerate evaluation during our ablations as it contains around half the problems of v5, yet is still representative of the full benchmark.
 
30
 
31
  - **Repository:** https://github.com/huggingface/open-r1
32
  - **Training logs:** https://wandb.ai/huggingface/open-r1/runs/199cum6l
33
+ - **Evaluation logs:** https://huggingface.co/datasets/open-r1/details-open-r1_OpenR1-Distill-7B
34
 
35
  ## Usage
36
 
 
94
 
95
  The individual experiments correspond to the following:
96
 
97
+ * **exp1 - exp3:** extending the model's base RoPE frequency from 10k to 100k, 300k, and 500k respectively. We find there is no significant difference between the scaling factors, and used 300k in all subsequent experiments.
98
+ * **exp4 - exp6:** independently scaling the learning rate on the math and code mixtures from 1e-5 to 2e-5, and 4e-5 respectively.
99
+ * **exp7 - exp8:** measuring the impact of sequence packing (exp7) versus no packing (exp8) on the math mixture.
100
+ * **exp9 - exp10:** measuring the impact of training on all three mixtures (math, code, and science) versus training on math and code only.
101
 
102
  > [!NOTE]
103
  > We use LiveCodeBench v4 to accelerate evaluation during our ablations as it contains around half the problems of v5, yet is still representative of the full benchmark.