Fixed training image line
Browse files- README.md +1 -1
- images/training_info.png +2 -2
README.md
CHANGED
@@ -54,7 +54,7 @@ It does not know general synonyms and it has poor textbook knowledge (e.g. it do
|
|
54 |
For best results, input molecules as SMILES: if you input molecules with their common names, the model may reason using the incorrect smiles, resulting in poor results.
|
55 |
For example, we have observed that the model often confuses lysine and glutamic acid if you ask questions using their common names, but should correctly reason about their chemistry if you provide their structures as SMILES.
|
56 |
|
57 |
-
## Training
|
58 |
|
59 |
We first pre-trained Mistral-Small-24B-Instruct-2501 via mostly incorrect reasoning traces from DeepSeek r1 to elicit reasoning and follow the new tokens/templates. Next, we used indepedent rounds of specialists trained with GRPO and verifiable rewards on one of the above tasks. We then aggregated and filtered reasoning traces (correct answers with reasoning) from the specialists to again fine-tune Mistral-Small-24B-Instruct-2501. Then, we did GRPO over all tasks. This last model was then put through safety post-training.
|
60 |
|
|
|
54 |
For best results, input molecules as SMILES: if you input molecules with their common names, the model may reason using the incorrect smiles, resulting in poor results.
|
55 |
For example, we have observed that the model often confuses lysine and glutamic acid if you ask questions using their common names, but should correctly reason about their chemistry if you provide their structures as SMILES.
|
56 |
|
57 |
+
## Training details
|
58 |
|
59 |
We first pre-trained Mistral-Small-24B-Instruct-2501 via mostly incorrect reasoning traces from DeepSeek r1 to elicit reasoning and follow the new tokens/templates. Next, we used indepedent rounds of specialists trained with GRPO and verifiable rewards on one of the above tasks. We then aggregated and filtered reasoning traces (correct answers with reasoning) from the specialists to again fine-tune Mistral-Small-24B-Instruct-2501. Then, we did GRPO over all tasks. This last model was then put through safety post-training.
|
60 |
|
images/training_info.png
CHANGED
![]() |
Git LFS Details
|
![]() |
Git LFS Details
|