whitead commited on
Commit
a218829
·
1 Parent(s): 69ce940

Fixed training image line

Browse files
Files changed (2) hide show
  1. README.md +1 -1
  2. images/training_info.png +2 -2
README.md CHANGED
@@ -54,7 +54,7 @@ It does not know general synonyms and it has poor textbook knowledge (e.g. it do
54
  For best results, input molecules as SMILES: if you input molecules with their common names, the model may reason using the incorrect smiles, resulting in poor results.
55
  For example, we have observed that the model often confuses lysine and glutamic acid if you ask questions using their common names, but should correctly reason about their chemistry if you provide their structures as SMILES.
56
 
57
- ## Training data and details
58
 
59
  We first pre-trained Mistral-Small-24B-Instruct-2501 via mostly incorrect reasoning traces from DeepSeek r1 to elicit reasoning and follow the new tokens/templates. Next, we used indepedent rounds of specialists trained with GRPO and verifiable rewards on one of the above tasks. We then aggregated and filtered reasoning traces (correct answers with reasoning) from the specialists to again fine-tune Mistral-Small-24B-Instruct-2501. Then, we did GRPO over all tasks. This last model was then put through safety post-training.
60
 
 
54
  For best results, input molecules as SMILES: if you input molecules with their common names, the model may reason using the incorrect smiles, resulting in poor results.
55
  For example, we have observed that the model often confuses lysine and glutamic acid if you ask questions using their common names, but should correctly reason about their chemistry if you provide their structures as SMILES.
56
 
57
+ ## Training details
58
 
59
  We first pre-trained Mistral-Small-24B-Instruct-2501 via mostly incorrect reasoning traces from DeepSeek r1 to elicit reasoning and follow the new tokens/templates. Next, we used indepedent rounds of specialists trained with GRPO and verifiable rewards on one of the above tasks. We then aggregated and filtered reasoning traces (correct answers with reasoning) from the specialists to again fine-tune Mistral-Small-24B-Instruct-2501. Then, we did GRPO over all tasks. This last model was then put through safety post-training.
60
 
images/training_info.png CHANGED

Git LFS Details

  • SHA256: 00385743792f71434cf262a549ce424507adee96096a8b60bbbdc358db702a43
  • Pointer size: 131 Bytes
  • Size of remote file: 587 kB

Git LFS Details

  • SHA256: 394653640102293eb6a10e4aac2c14fa58aea61077f3203211cf0226b38d84fe
  • Pointer size: 131 Bytes
  • Size of remote file: 589 kB