open-r1
/

OpenR1-Distill-7B

Text Generation

text-generation-inference

Model card Files Files and versions

lewtun HF Staff commited on 12 days ago

Commit

6104419

·

verified ·

1 Parent(s): 1a5f577

Update README.md

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -30,6 +30,7 @@ OpenR1-Distill-7B replicates the reasoning capabilities of [deepseek-ai/DeepSeek
 - **Repository:** https://github.com/huggingface/open-r1
 - **Training logs:** https://wandb.ai/huggingface/open-r1/runs/199cum6l
 ## Usage
@@ -93,10 +94,10 @@ OpenR1-Distill-7B was trained using supervised fine-tuning (SFT) on the [Mixture
 The individual experiments correspond to the following:
-* exp1 - exp3: extending the model's base RoPE frequency from 10k to 100k, 300k, and 500k respectively. We find there is no significant difference between the scaling factors, and choose 300k in all subsequent experiments.
-* exp4 - exp6: independently scaling the learning rate on the math and code mixtures from 1e-5 to 2e-5, and 4e-5 respectively.
-* exp7 - exp8: measuring the impact of sequence packing (exp7) versus no packing (exp8) on the math mixture.
-* exp9 - exp10: measuring the impact of training on all three mixtures (math, code, and science) versus training on math and code only.
 > [!NOTE]
 > We use LiveCodeBench v4 to accelerate evaluation during our ablations as it contains around half the problems of v5, yet is still representative of the full benchmark.

 - **Repository:** https://github.com/huggingface/open-r1
 - **Training logs:** https://wandb.ai/huggingface/open-r1/runs/199cum6l
+- **Evaluation logs:** https://huggingface.co/datasets/open-r1/details-open-r1_OpenR1-Distill-7B
 ## Usage
 The individual experiments correspond to the following:
+* **exp1 - exp3:** extending the model's base RoPE frequency from 10k to 100k, 300k, and 500k respectively. We find there is no significant difference between the scaling factors, and used 300k in all subsequent experiments.
+* **exp4 - exp6:** independently scaling the learning rate on the math and code mixtures from 1e-5 to 2e-5, and 4e-5 respectively.
+* **exp7 - exp8:** measuring the impact of sequence packing (exp7) versus no packing (exp8) on the math mixture.
+* **exp9 - exp10:** measuring the impact of training on all three mixtures (math, code, and science) versus training on math and code only.
 > [!NOTE]
 > We use LiveCodeBench v4 to accelerate evaluation during our ablations as it contains around half the problems of v5, yet is still representative of the full benchmark.