Update README.md
Browse files
README.md
CHANGED
@@ -9,16 +9,7 @@ base_model:
|
|
9 |
- Qwen/Qwen2.5-Math-7B
|
10 |
---
|
11 |
|
12 |
-
|
13 |
-
license: apache-2.0
|
14 |
-
library_name: transformers
|
15 |
-
pipeline_tag: text-generation
|
16 |
-
datasets:
|
17 |
-
- Satori-reasoning/Satori_FT_data
|
18 |
-
- Satori-reasoning/Satori_RL_data
|
19 |
-
base_model:
|
20 |
-
- Qwen/Qwen2.5-Math-7B
|
21 |
-
---
|
22 |
**Satori-7B-Round2** is a 7B LLM trained on open-source model (Qwen-2.5-Math-7B) and open-source data (OpenMathInstruct-2 and NuminaMath). **Satori-7B-Round2** is capable of autoregressive search, i.e., self-reflection and self-exploration without external guidance.
|
23 |
This is achieved through our proposed Chain-of-Action-Thought (COAT) reasoning and a two-stage post-training paradigm.
|
24 |
|
|
|
9 |
- Qwen/Qwen2.5-Math-7B
|
10 |
---
|
11 |
|
12 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
**Satori-7B-Round2** is a 7B LLM trained on open-source model (Qwen-2.5-Math-7B) and open-source data (OpenMathInstruct-2 and NuminaMath). **Satori-7B-Round2** is capable of autoregressive search, i.e., self-reflection and self-exploration without external guidance.
|
14 |
This is achieved through our proposed Chain-of-Action-Thought (COAT) reasoning and a two-stage post-training paradigm.
|
15 |
|