MinghaoYang lbourdois commited on
Commit
de10840
·
verified ·
1 Parent(s): 21876e3

Improve language tag (#1)

Browse files

- Improve language tag (20bbebe0c0f477859a1a2a23278a635d7645b659)


Co-authored-by: Loïck BOURDOIS <lbourdois@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +144 -130
README.md CHANGED
@@ -1,130 +1,144 @@
1
- ---
2
- library_name: transformers
3
- base_model: Qwen/Qwen2.5-32B-Instruct
4
- ---
5
- <div align="center">
6
- <img src="INF.jpg" width="300"/>
7
-
8
- 🤗 <a href="https://huggingface.co/infly" target="_blank">Hugging Face</a>
9
- <br>
10
- <a href="https://inftech-pi-zero.github.io/" target="_blank">Github</a>
11
- <br>
12
- <br>
13
- <br>
14
- </div>
15
-
16
- <div align="center">
17
- <h1>INF-o1-pi0: Initiating the Journey to the Infinity of LLM Reasoning</h1>
18
- <p>INF AI specializes in foundational large language model technology and applications. We develop trustworthy vertical-domain models and AI-native solutions tailored to industry needs. Our team of expert AI scientists and industry leaders focuses on practical "gray-box" technologies, unlocking the productivity of large language models to drive innovation across sectors. Our mission in the INF-o1 project is to enhance the reasoning capabilities of LLMs across various industrial domains and ensure a trustworthy reasoning process to serve industry needs.</p>
19
- <p>INFLY TECH (Shanghai) Co., Ltd.</p>
20
- <p>2024.12.31</p>
21
- </div>
22
-
23
-
24
-
25
- ## Overview
26
- We are pleased to share the initial checkpoint of our reasoning foundation large language model as an open-source resource. This checkpoint is intended to help evaluate our team's data production pipeline across various domains, including mathematics, programming, logic, safety, and others. Its goal is to provide a solid starting point for developing a robust policy for the subsequent reinforcement learning process.
27
-
28
- We are hopeful that applying our reinforcement learning algorithms, supported by our carefully designed infrastructure, will lead to meaningful improvements in the model’s reasoning capabilities across various domains. At the heart of the project is our data production pipeline, which we believe plays a crucial role in enabling general reasoning capabilities. We also believe that the reasoning capability induced by the data production pipline can address a range of real-world industrial scenarios with increasing precision and reliability.
29
-
30
- Based on our observations during the production of pi0, we have identified quality and diversity as critical factors for fostering high-quality, long Chain-of-Thought (CoT) reasoning capabilities. This insight aligns closely with conclusions drawn from the general alignment process of large language models. By meticulously designing self-verification and backtracking mechanisms to ensure process correctness in data generation, we have developed datasets that effectively induce robust long-context reasoning across diverse domains. This approach demonstrates superior performance compared to state-of-the-art o1-lile models with similar objectives, highlighting the potential of our data production pipline in advancing reasoning capabilities.
31
- ## Experiments
32
- ### Math Benchmarks
33
-
34
- | Model | College Math | AMC23 | MATH | Olympiad Bench | GaoKao 2023 En | AIME24 |
35
- | ---------------------- | ------------ | ----- | ----- | --------------- | -------------- | ------ |
36
- | Qwen2.5-32B-Instruct | 45.71 | 72.5 | 82.82 | 46.81 | 68.83 | 23.33 |
37
- | Qwen2.5-32B-QwQ | 43.33 | 72.5 | 88.54 | 55.56 | 78.70 | 40.00 |
38
- | INF-o1-pi0 | 47.27 | 85.0 | 88.60 | 56.00 | 77.14 | 40.00 |
39
- ### Logical Benchmark
40
- | Model | lsat |
41
- | ----------------- | :---: |
42
- | Qwen2.5-32B-Instruct | 33.7
43
- | Qwen2.5-32B-QwQ | 67.0 |
44
- | INF-o1-pi0 | 71.8 |
45
- ### Safety Benchmarks
46
- | Model | AIR-BENCH 2024 | AIR-BENCH 2024(CRF) |
47
- | ----------------- | :---: | :---: |
48
- | Qwen2.5-32B-Instruct | 54.29 | 53.83 |
49
- | Qwen2.5-32B-QwQ | 52.61 | 53.42 |
50
- | o1-preview | 73.25 | 70.72 |
51
- | INF-o1-pi0 | 77.25 | 74.49 |
52
- ### SQL Benchmarks
53
- | Model | bird | spider |
54
- | ----------------- | :---: | :---: |
55
- | Qwen2.5-32B-Instruct | 50.2 | 77.8 |
56
- | Qwen2.5-32B-QwQ | 43.7 | 69.9 |
57
- | o1-preview | 48.9 | 70.6 |
58
- | INF-o1-pi0 | 55.3 | 79.7 |
59
- ## Quick Start
60
- We provide an example usage of the inf-o1-pi0 below.
61
- ```python
62
- from transformers import AutoModelForCausalLM, AutoTokenizer
63
-
64
- model_name = "infly/inf-o1-pi0"
65
-
66
- model = AutoModelForCausalLM.from_pretrained(
67
- model_name,
68
- torch_dtype="auto",
69
- device_map="auto"
70
- )
71
- tokenizer = AutoTokenizer.from_pretrained(model_name)
72
-
73
- prompt = "Give me a short introduction to large language model."
74
-
75
- messages = [
76
- {"role": "system", "content": "You are an advanced AI language model specializing in solving math and programming problems step by step. Carefully analyze each part of the problem, verify the accuracy of your reasoning with relevant facts and data, and provide clear, logical solutions. Reflect on and review your approach throughout the problem-solving process to ensure precision and thoroughness. Always think through the problem step by step and provide your answers accordingly."},
77
- {"role": "user", "content": prompt}
78
- ]
79
- text = tokenizer.apply_chat_template(
80
- messages,
81
- tokenize=False,
82
- add_generation_prompt=True
83
- )
84
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
85
-
86
- generated_ids = model.generate(
87
- **model_inputs,
88
- max_new_tokens=512
89
- )
90
- generated_ids = [
91
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
92
- ]
93
-
94
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
95
- print(response)
96
-
97
-
98
- ```
99
- ## Future Plan
100
- Our pi0 serves as the foundation for ensuring that our data generation pipeline effectively leverages the long reasoning capabilities of large language models. Looking ahead, we plan to use pi0 as the initial policy checkpoint for reinforcement learning training. Through this process, we aim to significantly enhance the generalization of reasoning capabilities, particularly for tasks in the financial and medical domains, which are critical for both academic research and industrial applications.
101
- ## Contributor
102
- ### Supervisors
103
- Wei Chu • Yinghui Xu • Yuan Qi
104
- ### INF-o1 team
105
- **Listed in Alphabetical Order**
106
-
107
- Chao Qu - Team Leader • Chao Wang - Infrastructure • Cheng Peng - Data Pipeline (Logical) • Dakuan Lu - Data Pipeline (Science) • Haozhe Wang - Data Pipeline (Math) & RL • Hongqing Hu - Infrastructure • Jianming Feng - Data Pipeline (Safety) • Jiaran Hao - Data Pipeline (SQL) & Infrastructure • Kelang Tian - Infrastructure • Minghao Yang - Data Pipeline (Math) • Quanbin Wang - Data Pipeline (Safety) • J.K. Liu - Data Pipeline (SQL) • Tianchu Yao - Data Pipeline & Alignment • Weidi Xu - Data Pipeline (Logical) • Xiaoyu Tan - Data Pipeline & Alignment • Yihan Songliu - Infrastructure
108
- ## License Agreement
109
- infly-o1-pi0 support commercial applications under a permissive [License](https://huggingface.co/infly/inf-o1-pi0/blob/main/LICENSE).
110
- ## Contact
111
- Chao Qu: quchao_tequila@inftech.ai
112
- Xiaoyu Tan: yulin.txy@inftech.ai
113
- ## Cititation
114
- If you find our work helpful, feel free to give us a cite.
115
- ```
116
- @misc{inftech_pi_zero2024,
117
- author = {INF-o1 Team},
118
- title = {INF-o1 (\(\pi_0\)): Initiating the Journey to the Infinity of LLM Reasoning},
119
- year = {2024},
120
- url = {https://inftech-pi-zero.github.io/},
121
- note = {Accessed: 2024-12-31}
122
- }
123
- ```
124
-
125
-
126
-
127
-
128
-
129
-
130
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: Qwen/Qwen2.5-32B-Instruct
4
+ language:
5
+ - zho
6
+ - eng
7
+ - fra
8
+ - spa
9
+ - por
10
+ - deu
11
+ - ita
12
+ - rus
13
+ - jpn
14
+ - kor
15
+ - vie
16
+ - tha
17
+ - ara
18
+ ---
19
+ <div align="center">
20
+ <img src="INF.jpg" width="300"/>
21
+
22
+ 🤗 <a href="https://huggingface.co/infly" target="_blank">Hugging Face</a>
23
+ <br>
24
+ <a href="https://inftech-pi-zero.github.io/" target="_blank">Github</a>
25
+ <br>
26
+ <br>
27
+ <br>
28
+ </div>
29
+
30
+ <div align="center">
31
+ <h1>INF-o1-pi0: Initiating the Journey to the Infinity of LLM Reasoning</h1>
32
+ <p>INF AI specializes in foundational large language model technology and applications. We develop trustworthy vertical-domain models and AI-native solutions tailored to industry needs. Our team of expert AI scientists and industry leaders focuses on practical "gray-box" technologies, unlocking the productivity of large language models to drive innovation across sectors. Our mission in the INF-o1 project is to enhance the reasoning capabilities of LLMs across various industrial domains and ensure a trustworthy reasoning process to serve industry needs.</p>
33
+ <p>INFLY TECH (Shanghai) Co., Ltd.</p>
34
+ <p>2024.12.31</p>
35
+ </div>
36
+
37
+
38
+
39
+ ## Overview
40
+ We are pleased to share the initial checkpoint of our reasoning foundation large language model as an open-source resource. This checkpoint is intended to help evaluate our team's data production pipeline across various domains, including mathematics, programming, logic, safety, and others. Its goal is to provide a solid starting point for developing a robust policy for the subsequent reinforcement learning process.
41
+
42
+ We are hopeful that applying our reinforcement learning algorithms, supported by our carefully designed infrastructure, will lead to meaningful improvements in the model’s reasoning capabilities across various domains. At the heart of the project is our data production pipeline, which we believe plays a crucial role in enabling general reasoning capabilities. We also believe that the reasoning capability induced by the data production pipline can address a range of real-world industrial scenarios with increasing precision and reliability.
43
+
44
+ Based on our observations during the production of pi0, we have identified quality and diversity as critical factors for fostering high-quality, long Chain-of-Thought (CoT) reasoning capabilities. This insight aligns closely with conclusions drawn from the general alignment process of large language models. By meticulously designing self-verification and backtracking mechanisms to ensure process correctness in data generation, we have developed datasets that effectively induce robust long-context reasoning across diverse domains. This approach demonstrates superior performance compared to state-of-the-art o1-lile models with similar objectives, highlighting the potential of our data production pipline in advancing reasoning capabilities.
45
+ ## Experiments
46
+ ### Math Benchmarks
47
+
48
+ | Model | College Math | AMC23 | MATH | Olympiad Bench | GaoKao 2023 En | AIME24 |
49
+ | ---------------------- | ------------ | ----- | ----- | --------------- | -------------- | ------ |
50
+ | Qwen2.5-32B-Instruct | 45.71 | 72.5 | 82.82 | 46.81 | 68.83 | 23.33 |
51
+ | Qwen2.5-32B-QwQ | 43.33 | 72.5 | 88.54 | 55.56 | 78.70 | 40.00 |
52
+ | INF-o1-pi0 | 47.27 | 85.0 | 88.60 | 56.00 | 77.14 | 40.00 |
53
+ ### Logical Benchmark
54
+ | Model | lsat |
55
+ | ----------------- | :---: |
56
+ | Qwen2.5-32B-Instruct | 33.7
57
+ | Qwen2.5-32B-QwQ | 67.0 |
58
+ | INF-o1-pi0 | 71.8 |
59
+ ### Safety Benchmarks
60
+ | Model | AIR-BENCH 2024 | AIR-BENCH 2024(CRF) |
61
+ | ----------------- | :---: | :---: |
62
+ | Qwen2.5-32B-Instruct | 54.29 | 53.83 |
63
+ | Qwen2.5-32B-QwQ | 52.61 | 53.42 |
64
+ | o1-preview | 73.25 | 70.72 |
65
+ | INF-o1-pi0 | 77.25 | 74.49 |
66
+ ### SQL Benchmarks
67
+ | Model | bird | spider |
68
+ | ----------------- | :---: | :---: |
69
+ | Qwen2.5-32B-Instruct | 50.2 | 77.8 |
70
+ | Qwen2.5-32B-QwQ | 43.7 | 69.9 |
71
+ | o1-preview | 48.9 | 70.6 |
72
+ | INF-o1-pi0 | 55.3 | 79.7 |
73
+ ## Quick Start
74
+ We provide an example usage of the inf-o1-pi0 below.
75
+ ```python
76
+ from transformers import AutoModelForCausalLM, AutoTokenizer
77
+
78
+ model_name = "infly/inf-o1-pi0"
79
+
80
+ model = AutoModelForCausalLM.from_pretrained(
81
+ model_name,
82
+ torch_dtype="auto",
83
+ device_map="auto"
84
+ )
85
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
86
+
87
+ prompt = "Give me a short introduction to large language model."
88
+
89
+ messages = [
90
+ {"role": "system", "content": "You are an advanced AI language model specializing in solving math and programming problems step by step. Carefully analyze each part of the problem, verify the accuracy of your reasoning with relevant facts and data, and provide clear, logical solutions. Reflect on and review your approach throughout the problem-solving process to ensure precision and thoroughness. Always think through the problem step by step and provide your answers accordingly."},
91
+ {"role": "user", "content": prompt}
92
+ ]
93
+ text = tokenizer.apply_chat_template(
94
+ messages,
95
+ tokenize=False,
96
+ add_generation_prompt=True
97
+ )
98
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
99
+
100
+ generated_ids = model.generate(
101
+ **model_inputs,
102
+ max_new_tokens=512
103
+ )
104
+ generated_ids = [
105
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
106
+ ]
107
+
108
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
109
+ print(response)
110
+
111
+
112
+ ```
113
+ ## Future Plan
114
+ Our pi0 serves as the foundation for ensuring that our data generation pipeline effectively leverages the long reasoning capabilities of large language models. Looking ahead, we plan to use pi0 as the initial policy checkpoint for reinforcement learning training. Through this process, we aim to significantly enhance the generalization of reasoning capabilities, particularly for tasks in the financial and medical domains, which are critical for both academic research and industrial applications.
115
+ ## Contributor
116
+ ### Supervisors
117
+ Wei Chu • Yinghui Xu • Yuan Qi
118
+ ### INF-o1 team
119
+ **Listed in Alphabetical Order**
120
+
121
+ Chao Qu - Team Leader • Chao Wang - Infrastructure • Cheng Peng - Data Pipeline (Logical) • Dakuan Lu - Data Pipeline (Science) • Haozhe Wang - Data Pipeline (Math) & RL • Hongqing Hu - Infrastructure • Jianming Feng - Data Pipeline (Safety) • Jiaran Hao - Data Pipeline (SQL) & Infrastructure • Kelang Tian - Infrastructure • Minghao Yang - Data Pipeline (Math) • Quanbin Wang - Data Pipeline (Safety) • J.K. Liu - Data Pipeline (SQL) • Tianchu Yao - Data Pipeline & Alignment • Weidi Xu - Data Pipeline (Logical) • Xiaoyu Tan - Data Pipeline & Alignment • Yihan Songliu - Infrastructure
122
+ ## License Agreement
123
+ infly-o1-pi0 support commercial applications under a permissive [License](https://huggingface.co/infly/inf-o1-pi0/blob/main/LICENSE).
124
+ ## Contact
125
+ Chao Qu: quchao_tequila@inftech.ai
126
+ Xiaoyu Tan: yulin.txy@inftech.ai
127
+ ## Cititation
128
+ If you find our work helpful, feel free to give us a cite.
129
+ ```
130
+ @misc{inftech_pi_zero2024,
131
+ author = {INF-o1 Team},
132
+ title = {INF-o1 (\(\pi_0\)): Initiating the Journey to the Infinity of LLM Reasoning},
133
+ year = {2024},
134
+ url = {https://inftech-pi-zero.github.io/},
135
+ note = {Accessed: 2024-12-31}
136
+ }
137
+ ```
138
+
139
+
140
+
141
+
142
+
143
+
144
+