PEFT
Safetensors
91veMe4Plus commited on
Commit
54e2ac0
·
verified ·
1 Parent(s): 711f845

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. .DS_Store +0 -0
  2. README.md +202 -3
  3. adapter_config.json +39 -0
  4. added_tokens.json +35 -0
  5. checkpoint-1400/README.md +202 -0
  6. checkpoint-1400/adapter_config.json +39 -0
  7. checkpoint-1400/adapter_model.safetensors +3 -0
  8. checkpoint-1400/added_tokens.json +35 -0
  9. checkpoint-1400/merges.txt +0 -0
  10. checkpoint-1400/rng_state.pth +3 -0
  11. checkpoint-1400/scaler.pt +3 -0
  12. checkpoint-1400/scheduler.pt +3 -0
  13. checkpoint-1400/special_tokens_map.json +86 -0
  14. checkpoint-1400/tokenizer.json +0 -0
  15. checkpoint-1400/tokenizer_config.json +502 -0
  16. checkpoint-1400/trainer_state.json +1350 -0
  17. checkpoint-1400/training_args.bin +3 -0
  18. checkpoint-1400/vocab.json +0 -0
  19. checkpoint-1600/README.md +202 -0
  20. checkpoint-1600/adapter_config.json +39 -0
  21. checkpoint-1600/added_tokens.json +35 -0
  22. checkpoint-1600/merges.txt +0 -0
  23. checkpoint-1600/rng_state.pth +3 -0
  24. checkpoint-1600/scheduler.pt +3 -0
  25. checkpoint-1600/special_tokens_map.json +86 -0
  26. checkpoint-1600/tokenizer.json +0 -0
  27. checkpoint-1600/tokenizer_config.json +502 -0
  28. checkpoint-1600/trainer_state.json +1538 -0
  29. checkpoint-1600/training_args.bin +3 -0
  30. checkpoint-1600/vocab.json +0 -0
  31. checkpoint-1686/README.md +202 -0
  32. checkpoint-1686/adapter_config.json +39 -0
  33. checkpoint-1686/adapter_model.safetensors +3 -0
  34. checkpoint-1686/added_tokens.json +35 -0
  35. checkpoint-1686/merges.txt +0 -0
  36. checkpoint-1686/rng_state.pth +3 -0
  37. checkpoint-1686/scaler.pt +3 -0
  38. checkpoint-1686/scheduler.pt +3 -0
  39. checkpoint-1686/special_tokens_map.json +86 -0
  40. checkpoint-1686/tokenizer.json +0 -0
  41. checkpoint-1686/tokenizer_config.json +502 -0
  42. checkpoint-1686/trainer_state.json +1610 -0
  43. checkpoint-1686/training_args.bin +3 -0
  44. checkpoint-1686/vocab.json +0 -0
  45. merges.txt +0 -0
  46. special_tokens_map.json +86 -0
  47. tokenizer.json +0 -0
  48. tokenizer_config.json +502 -0
  49. training_args.bin +3 -0
  50. vocab.json +0 -0
.DS_Store ADDED
Binary file (6.15 kB). View file
 
README.md CHANGED
@@ -1,3 +1,202 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "o_proj",
28
+ "up_proj",
29
+ "k_proj",
30
+ "v_proj",
31
+ "gate_proj",
32
+ "down_proj",
33
+ "q_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
added_tokens.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<EMAIL>": 110521,
3
+ "<KEY>": 110522,
4
+ "<NAME>": 110520,
5
+ "<PASSWORD>": 110523,
6
+ "<code_to_intermediate>": 110502,
7
+ "<empty_output>": 110501,
8
+ "<file_sep>": 110492,
9
+ "<intermediate_to_code>": 110503,
10
+ "<issue_closed>": 110495,
11
+ "<issue_comment>": 110494,
12
+ "<issue_start>": 110493,
13
+ "<jupyter_code>": 110498,
14
+ "<jupyter_output>": 110499,
15
+ "<jupyter_script>": 110500,
16
+ "<jupyter_start>": 110496,
17
+ "<jupyter_text>": 110497,
18
+ "<pr>": 110504,
19
+ "<pr_base>": 110507,
20
+ "<pr_base_code>": 110509,
21
+ "<pr_comment>": 110512,
22
+ "<pr_diff>": 110510,
23
+ "<pr_diff_hunk>": 110511,
24
+ "<pr_diff_hunk_comment_line>": 110519,
25
+ "<pr_event_id>": 110513,
26
+ "<pr_file>": 110508,
27
+ "<pr_in_reply_to_comment_id>": 110518,
28
+ "<pr_in_reply_to_review_id>": 110517,
29
+ "<pr_is_merged>": 110506,
30
+ "<pr_review>": 110514,
31
+ "<pr_review_comment>": 110516,
32
+ "<pr_review_state>": 110515,
33
+ "<pr_status>": 110505,
34
+ "<repo_name>": 110491
35
+ }
checkpoint-1400/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
checkpoint-1400/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "o_proj",
28
+ "up_proj",
29
+ "k_proj",
30
+ "v_proj",
31
+ "gate_proj",
32
+ "down_proj",
33
+ "q_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
checkpoint-1400/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34ba3e6c3329e613b854e2dbdde707981eb68b36d180aae11a503d5a46b5f9e1
3
+ size 39366152
checkpoint-1400/added_tokens.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<EMAIL>": 110521,
3
+ "<KEY>": 110522,
4
+ "<NAME>": 110520,
5
+ "<PASSWORD>": 110523,
6
+ "<code_to_intermediate>": 110502,
7
+ "<empty_output>": 110501,
8
+ "<file_sep>": 110492,
9
+ "<intermediate_to_code>": 110503,
10
+ "<issue_closed>": 110495,
11
+ "<issue_comment>": 110494,
12
+ "<issue_start>": 110493,
13
+ "<jupyter_code>": 110498,
14
+ "<jupyter_output>": 110499,
15
+ "<jupyter_script>": 110500,
16
+ "<jupyter_start>": 110496,
17
+ "<jupyter_text>": 110497,
18
+ "<pr>": 110504,
19
+ "<pr_base>": 110507,
20
+ "<pr_base_code>": 110509,
21
+ "<pr_comment>": 110512,
22
+ "<pr_diff>": 110510,
23
+ "<pr_diff_hunk>": 110511,
24
+ "<pr_diff_hunk_comment_line>": 110519,
25
+ "<pr_event_id>": 110513,
26
+ "<pr_file>": 110508,
27
+ "<pr_in_reply_to_comment_id>": 110518,
28
+ "<pr_in_reply_to_review_id>": 110517,
29
+ "<pr_is_merged>": 110506,
30
+ "<pr_review>": 110514,
31
+ "<pr_review_comment>": 110516,
32
+ "<pr_review_state>": 110515,
33
+ "<pr_status>": 110505,
34
+ "<repo_name>": 110491
35
+ }
checkpoint-1400/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1400/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:234ba2ba854e17783128a343692bbffa9b7636ddfe7803957b3de9cafbbfa181
3
+ size 14244
checkpoint-1400/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:baba31a5e5063037a5c811de9cb04bc62c6c5f0f5fe6720b7d681afe6500d4c1
3
+ size 988
checkpoint-1400/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2eb86e2f9cdb17a8fd263531ec54f01fc23332497a78f61bccb34891c9fcdaf1
3
+ size 1064
checkpoint-1400/special_tokens_map.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|fim_prefix|>",
5
+ "<|fim_middle|>",
6
+ "<|fim_suffix|>",
7
+ "<|endofprompt|>",
8
+ "<|_unuse_missing_100256|>",
9
+ "<|_unuse_missing_100261|>",
10
+ "<|_unuse_missing_100262|>",
11
+ "<|_unuse_missing_100263|>",
12
+ "<|_unuse_missing_100264|>",
13
+ "<|_unuse_missing_100265|>",
14
+ "<|_unuse_missing_100266|>",
15
+ "<|_unuse_missing_100267|>",
16
+ "<|_unuse_missing_100268|>",
17
+ "<|_unuse_missing_100269|>",
18
+ "<|_unuse_missing_100270|>",
19
+ "<|_unuse_missing_100271|>",
20
+ "<|im_start|>",
21
+ "<|im_end|>",
22
+ "<|stop|>",
23
+ "<|endofturn|>",
24
+ "<repo_name>",
25
+ "<file_sep>",
26
+ "<issue_start>",
27
+ "<issue_comment>",
28
+ "<issue_closed>",
29
+ "<jupyter_start>",
30
+ "<jupyter_text>",
31
+ "<jupyter_code>",
32
+ "<jupyter_output>",
33
+ "<jupyter_script>",
34
+ "<empty_output>",
35
+ "<code_to_intermediate>",
36
+ "<intermediate_to_code>",
37
+ "<pr>",
38
+ "<pr_status>",
39
+ "<pr_is_merged>",
40
+ "<pr_base>",
41
+ "<pr_file>",
42
+ "<pr_base_code>",
43
+ "<pr_diff>",
44
+ "<pr_diff_hunk>",
45
+ "<pr_comment>",
46
+ "<pr_event_id>",
47
+ "<pr_review>",
48
+ "<pr_review_state>",
49
+ "<pr_review_comment>",
50
+ "<pr_in_reply_to_review_id>",
51
+ "<pr_in_reply_to_comment_id>",
52
+ "<pr_diff_hunk_comment_line>",
53
+ "<NAME>",
54
+ "<EMAIL>",
55
+ "<KEY>",
56
+ "<PASSWORD>"
57
+ ],
58
+ "bos_token": {
59
+ "content": "<|endoftext|>",
60
+ "lstrip": false,
61
+ "normalized": false,
62
+ "rstrip": false,
63
+ "single_word": false
64
+ },
65
+ "eos_token": {
66
+ "content": "<|endofturn|>",
67
+ "lstrip": false,
68
+ "normalized": false,
69
+ "rstrip": false,
70
+ "single_word": false
71
+ },
72
+ "pad_token": {
73
+ "content": "<|endoftext|>",
74
+ "lstrip": false,
75
+ "normalized": false,
76
+ "rstrip": false,
77
+ "single_word": false
78
+ },
79
+ "unk_token": {
80
+ "content": "<|endoftext|>",
81
+ "lstrip": false,
82
+ "normalized": false,
83
+ "rstrip": false,
84
+ "single_word": false
85
+ }
86
+ }
checkpoint-1400/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1400/tokenizer_config.json ADDED
@@ -0,0 +1,502 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|_unuse_missing_100256|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "<|_unuse_missing_100261|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "100262": {
53
+ "content": "<|_unuse_missing_100262|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "100263": {
61
+ "content": "<|_unuse_missing_100263|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "100264": {
69
+ "content": "<|_unuse_missing_100264|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|_unuse_missing_100265|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<|_unuse_missing_100266|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "100267": {
93
+ "content": "<|_unuse_missing_100267|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "100268": {
101
+ "content": "<|_unuse_missing_100268|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "100269": {
109
+ "content": "<|_unuse_missing_100269|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "100270": {
117
+ "content": "<|_unuse_missing_100270|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "100271": {
125
+ "content": "<|_unuse_missing_100271|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "100272": {
133
+ "content": "<|im_start|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "100273": {
141
+ "content": "<|im_end|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "100274": {
149
+ "content": "<|stop|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "100275": {
157
+ "content": "<|endofturn|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "110491": {
173
+ "content": "<repo_name>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "110492": {
181
+ "content": "<file_sep>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "110493": {
189
+ "content": "<issue_start>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "110494": {
197
+ "content": "<issue_comment>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "110495": {
205
+ "content": "<issue_closed>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "110496": {
213
+ "content": "<jupyter_start>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "110497": {
221
+ "content": "<jupyter_text>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "110498": {
229
+ "content": "<jupyter_code>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "110499": {
237
+ "content": "<jupyter_output>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "110500": {
245
+ "content": "<jupyter_script>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "110501": {
253
+ "content": "<empty_output>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "110502": {
261
+ "content": "<code_to_intermediate>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "110503": {
269
+ "content": "<intermediate_to_code>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "110504": {
277
+ "content": "<pr>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "110505": {
285
+ "content": "<pr_status>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "110506": {
293
+ "content": "<pr_is_merged>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "110507": {
301
+ "content": "<pr_base>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "110508": {
309
+ "content": "<pr_file>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "110509": {
317
+ "content": "<pr_base_code>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "110510": {
325
+ "content": "<pr_diff>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "110511": {
333
+ "content": "<pr_diff_hunk>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "110512": {
341
+ "content": "<pr_comment>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "110513": {
349
+ "content": "<pr_event_id>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "110514": {
357
+ "content": "<pr_review>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "110515": {
365
+ "content": "<pr_review_state>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "110516": {
373
+ "content": "<pr_review_comment>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "110517": {
381
+ "content": "<pr_in_reply_to_review_id>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "110518": {
389
+ "content": "<pr_in_reply_to_comment_id>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "110519": {
397
+ "content": "<pr_diff_hunk_comment_line>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "110520": {
405
+ "content": "<NAME>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "110521": {
413
+ "content": "<EMAIL>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "110522": {
421
+ "content": "<KEY>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "110523": {
429
+ "content": "<PASSWORD>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ }
436
+ },
437
+ "additional_special_tokens": [
438
+ "<|endoftext|>",
439
+ "<|fim_prefix|>",
440
+ "<|fim_middle|>",
441
+ "<|fim_suffix|>",
442
+ "<|endofprompt|>",
443
+ "<|_unuse_missing_100256|>",
444
+ "<|_unuse_missing_100261|>",
445
+ "<|_unuse_missing_100262|>",
446
+ "<|_unuse_missing_100263|>",
447
+ "<|_unuse_missing_100264|>",
448
+ "<|_unuse_missing_100265|>",
449
+ "<|_unuse_missing_100266|>",
450
+ "<|_unuse_missing_100267|>",
451
+ "<|_unuse_missing_100268|>",
452
+ "<|_unuse_missing_100269|>",
453
+ "<|_unuse_missing_100270|>",
454
+ "<|_unuse_missing_100271|>",
455
+ "<|im_start|>",
456
+ "<|im_end|>",
457
+ "<|stop|>",
458
+ "<|endofturn|>",
459
+ "<repo_name>",
460
+ "<file_sep>",
461
+ "<issue_start>",
462
+ "<issue_comment>",
463
+ "<issue_closed>",
464
+ "<jupyter_start>",
465
+ "<jupyter_text>",
466
+ "<jupyter_code>",
467
+ "<jupyter_output>",
468
+ "<jupyter_script>",
469
+ "<empty_output>",
470
+ "<code_to_intermediate>",
471
+ "<intermediate_to_code>",
472
+ "<pr>",
473
+ "<pr_status>",
474
+ "<pr_is_merged>",
475
+ "<pr_base>",
476
+ "<pr_file>",
477
+ "<pr_base_code>",
478
+ "<pr_diff>",
479
+ "<pr_diff_hunk>",
480
+ "<pr_comment>",
481
+ "<pr_event_id>",
482
+ "<pr_review>",
483
+ "<pr_review_state>",
484
+ "<pr_review_comment>",
485
+ "<pr_in_reply_to_review_id>",
486
+ "<pr_in_reply_to_comment_id>",
487
+ "<pr_diff_hunk_comment_line>",
488
+ "<NAME>",
489
+ "<EMAIL>",
490
+ "<KEY>",
491
+ "<PASSWORD>"
492
+ ],
493
+ "bos_token": "<|endoftext|>",
494
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
495
+ "clean_up_tokenization_spaces": true,
496
+ "eos_token": "<|endofturn|>",
497
+ "extra_special_tokens": {},
498
+ "model_max_length": 1000000000000000019884624838656,
499
+ "pad_token": "<|endoftext|>",
500
+ "tokenizer_class": "GPT2Tokenizer",
501
+ "unk_token": "<|endoftext|>"
502
+ }
checkpoint-1400/trainer_state.json ADDED
@@ -0,0 +1,1350 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1400,
3
+ "best_metric": 1.922593355178833,
4
+ "best_model_checkpoint": "./hyperclova-deobfuscation-lora/checkpoint-1400",
5
+ "epoch": 2.487111111111111,
6
+ "eval_steps": 200,
7
+ "global_step": 1400,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.017777777777777778,
14
+ "grad_norm": 3.3687641620635986,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 4.1361,
17
+ "mean_token_accuracy": 0.3493226237595081,
18
+ "num_tokens": 22106.0,
19
+ "step": 10
20
+ },
21
+ {
22
+ "epoch": 0.035555555555555556,
23
+ "grad_norm": 2.5920090675354004,
24
+ "learning_rate": 3.8e-05,
25
+ "loss": 3.7165,
26
+ "mean_token_accuracy": 0.4088538818061352,
27
+ "num_tokens": 44943.0,
28
+ "step": 20
29
+ },
30
+ {
31
+ "epoch": 0.05333333333333334,
32
+ "grad_norm": 2.5703377723693848,
33
+ "learning_rate": 5.8e-05,
34
+ "loss": 3.3356,
35
+ "mean_token_accuracy": 0.4755532510578632,
36
+ "num_tokens": 67397.0,
37
+ "step": 30
38
+ },
39
+ {
40
+ "epoch": 0.07111111111111111,
41
+ "grad_norm": 1.698912262916565,
42
+ "learning_rate": 7.800000000000001e-05,
43
+ "loss": 2.9874,
44
+ "mean_token_accuracy": 0.508383595943451,
45
+ "num_tokens": 89803.0,
46
+ "step": 40
47
+ },
48
+ {
49
+ "epoch": 0.08888888888888889,
50
+ "grad_norm": 1.4602556228637695,
51
+ "learning_rate": 9.8e-05,
52
+ "loss": 2.7854,
53
+ "mean_token_accuracy": 0.5358646497130394,
54
+ "num_tokens": 112364.0,
55
+ "step": 50
56
+ },
57
+ {
58
+ "epoch": 0.10666666666666667,
59
+ "grad_norm": 1.5916705131530762,
60
+ "learning_rate": 0.000118,
61
+ "loss": 2.6546,
62
+ "mean_token_accuracy": 0.5485944993793964,
63
+ "num_tokens": 134028.0,
64
+ "step": 60
65
+ },
66
+ {
67
+ "epoch": 0.12444444444444444,
68
+ "grad_norm": 1.6815338134765625,
69
+ "learning_rate": 0.000138,
70
+ "loss": 2.606,
71
+ "mean_token_accuracy": 0.5535938143730164,
72
+ "num_tokens": 156703.0,
73
+ "step": 70
74
+ },
75
+ {
76
+ "epoch": 0.14222222222222222,
77
+ "grad_norm": 1.8009140491485596,
78
+ "learning_rate": 0.00015800000000000002,
79
+ "loss": 2.5307,
80
+ "mean_token_accuracy": 0.5640750013291835,
81
+ "num_tokens": 178986.0,
82
+ "step": 80
83
+ },
84
+ {
85
+ "epoch": 0.16,
86
+ "grad_norm": 1.4582855701446533,
87
+ "learning_rate": 0.00017800000000000002,
88
+ "loss": 2.5633,
89
+ "mean_token_accuracy": 0.5567230455577373,
90
+ "num_tokens": 201989.0,
91
+ "step": 90
92
+ },
93
+ {
94
+ "epoch": 0.17777777777777778,
95
+ "grad_norm": 1.663874626159668,
96
+ "learning_rate": 0.00019800000000000002,
97
+ "loss": 2.4672,
98
+ "mean_token_accuracy": 0.5688358306884765,
99
+ "num_tokens": 223936.0,
100
+ "step": 100
101
+ },
102
+ {
103
+ "epoch": 0.19555555555555557,
104
+ "grad_norm": 1.6701704263687134,
105
+ "learning_rate": 0.00019886506935687262,
106
+ "loss": 2.4388,
107
+ "mean_token_accuracy": 0.5760447531938553,
108
+ "num_tokens": 246101.0,
109
+ "step": 110
110
+ },
111
+ {
112
+ "epoch": 0.21333333333333335,
113
+ "grad_norm": 1.5731302499771118,
114
+ "learning_rate": 0.00019760403530895334,
115
+ "loss": 2.4377,
116
+ "mean_token_accuracy": 0.5711787067353725,
117
+ "num_tokens": 269187.0,
118
+ "step": 120
119
+ },
120
+ {
121
+ "epoch": 0.2311111111111111,
122
+ "grad_norm": 1.4479353427886963,
123
+ "learning_rate": 0.00019634300126103406,
124
+ "loss": 2.3596,
125
+ "mean_token_accuracy": 0.5830569051206111,
126
+ "num_tokens": 291454.0,
127
+ "step": 130
128
+ },
129
+ {
130
+ "epoch": 0.24888888888888888,
131
+ "grad_norm": 1.3653457164764404,
132
+ "learning_rate": 0.00019508196721311475,
133
+ "loss": 2.3648,
134
+ "mean_token_accuracy": 0.5807973451912403,
135
+ "num_tokens": 314204.0,
136
+ "step": 140
137
+ },
138
+ {
139
+ "epoch": 0.26666666666666666,
140
+ "grad_norm": 1.4210327863693237,
141
+ "learning_rate": 0.00019382093316519546,
142
+ "loss": 2.3186,
143
+ "mean_token_accuracy": 0.5878118917346,
144
+ "num_tokens": 337167.0,
145
+ "step": 150
146
+ },
147
+ {
148
+ "epoch": 0.28444444444444444,
149
+ "grad_norm": 1.532408356666565,
150
+ "learning_rate": 0.00019255989911727615,
151
+ "loss": 2.3637,
152
+ "mean_token_accuracy": 0.5761628717184066,
153
+ "num_tokens": 360272.0,
154
+ "step": 160
155
+ },
156
+ {
157
+ "epoch": 0.3022222222222222,
158
+ "grad_norm": 1.4010679721832275,
159
+ "learning_rate": 0.00019129886506935687,
160
+ "loss": 2.2701,
161
+ "mean_token_accuracy": 0.598077318072319,
162
+ "num_tokens": 382779.0,
163
+ "step": 170
164
+ },
165
+ {
166
+ "epoch": 0.32,
167
+ "grad_norm": 1.5830323696136475,
168
+ "learning_rate": 0.0001900378310214376,
169
+ "loss": 2.2861,
170
+ "mean_token_accuracy": 0.5928302705287933,
171
+ "num_tokens": 405438.0,
172
+ "step": 180
173
+ },
174
+ {
175
+ "epoch": 0.3377777777777778,
176
+ "grad_norm": 1.4623483419418335,
177
+ "learning_rate": 0.00018877679697351828,
178
+ "loss": 2.3192,
179
+ "mean_token_accuracy": 0.5854370579123497,
180
+ "num_tokens": 428660.0,
181
+ "step": 190
182
+ },
183
+ {
184
+ "epoch": 0.35555555555555557,
185
+ "grad_norm": 1.4850527048110962,
186
+ "learning_rate": 0.000187515762925599,
187
+ "loss": 2.256,
188
+ "step": 200
189
+ },
190
+ {
191
+ "epoch": 0.35555555555555557,
192
+ "eval_loss": 2.254753351211548,
193
+ "eval_mean_token_accuracy": 0.5965457199811935,
194
+ "eval_num_tokens": 450808.0,
195
+ "eval_runtime": 30.9386,
196
+ "eval_samples_per_second": 32.322,
197
+ "eval_steps_per_second": 8.081,
198
+ "step": 200
199
+ },
200
+ {
201
+ "epoch": 0.37333333333333335,
202
+ "grad_norm": 1.4195237159729004,
203
+ "learning_rate": 0.00018625472887767968,
204
+ "loss": 2.2607,
205
+ "mean_token_accuracy": 0.594056948274374,
206
+ "num_tokens": 473434.0,
207
+ "step": 210
208
+ },
209
+ {
210
+ "epoch": 0.39111111111111113,
211
+ "grad_norm": 1.3114796876907349,
212
+ "learning_rate": 0.0001849936948297604,
213
+ "loss": 2.2947,
214
+ "mean_token_accuracy": 0.5898103177547455,
215
+ "num_tokens": 496482.0,
216
+ "step": 220
217
+ },
218
+ {
219
+ "epoch": 0.4088888888888889,
220
+ "grad_norm": 1.4004285335540771,
221
+ "learning_rate": 0.00018373266078184112,
222
+ "loss": 2.2542,
223
+ "mean_token_accuracy": 0.5970372915267944,
224
+ "num_tokens": 519379.0,
225
+ "step": 230
226
+ },
227
+ {
228
+ "epoch": 0.4266666666666667,
229
+ "grad_norm": 1.3860116004943848,
230
+ "learning_rate": 0.0001824716267339218,
231
+ "loss": 2.2636,
232
+ "mean_token_accuracy": 0.59425338357687,
233
+ "num_tokens": 542631.0,
234
+ "step": 240
235
+ },
236
+ {
237
+ "epoch": 0.4444444444444444,
238
+ "grad_norm": 1.3675146102905273,
239
+ "learning_rate": 0.00018121059268600253,
240
+ "loss": 2.2412,
241
+ "mean_token_accuracy": 0.5928545072674751,
242
+ "num_tokens": 565400.0,
243
+ "step": 250
244
+ },
245
+ {
246
+ "epoch": 0.4622222222222222,
247
+ "grad_norm": 1.4246889352798462,
248
+ "learning_rate": 0.00017994955863808322,
249
+ "loss": 2.1577,
250
+ "mean_token_accuracy": 0.6061514511704444,
251
+ "num_tokens": 588003.0,
252
+ "step": 260
253
+ },
254
+ {
255
+ "epoch": 0.48,
256
+ "grad_norm": 1.4046531915664673,
257
+ "learning_rate": 0.00017868852459016393,
258
+ "loss": 2.1862,
259
+ "mean_token_accuracy": 0.6008762732148171,
260
+ "num_tokens": 610974.0,
261
+ "step": 270
262
+ },
263
+ {
264
+ "epoch": 0.49777777777777776,
265
+ "grad_norm": 1.4038338661193848,
266
+ "learning_rate": 0.00017742749054224465,
267
+ "loss": 2.2219,
268
+ "mean_token_accuracy": 0.5970636487007142,
269
+ "num_tokens": 634093.0,
270
+ "step": 280
271
+ },
272
+ {
273
+ "epoch": 0.5155555555555555,
274
+ "grad_norm": 1.3291988372802734,
275
+ "learning_rate": 0.00017616645649432534,
276
+ "loss": 2.131,
277
+ "mean_token_accuracy": 0.6172704175114632,
278
+ "num_tokens": 656188.0,
279
+ "step": 290
280
+ },
281
+ {
282
+ "epoch": 0.5333333333333333,
283
+ "grad_norm": 1.444318413734436,
284
+ "learning_rate": 0.00017490542244640606,
285
+ "loss": 2.1691,
286
+ "mean_token_accuracy": 0.6066021353006363,
287
+ "num_tokens": 678769.0,
288
+ "step": 300
289
+ },
290
+ {
291
+ "epoch": 0.5511111111111111,
292
+ "grad_norm": 1.3459752798080444,
293
+ "learning_rate": 0.00017364438839848675,
294
+ "loss": 2.1413,
295
+ "mean_token_accuracy": 0.6139265760779381,
296
+ "num_tokens": 701734.0,
297
+ "step": 310
298
+ },
299
+ {
300
+ "epoch": 0.5688888888888889,
301
+ "grad_norm": 1.3597490787506104,
302
+ "learning_rate": 0.00017238335435056746,
303
+ "loss": 2.1271,
304
+ "mean_token_accuracy": 0.6106095433235168,
305
+ "num_tokens": 724815.0,
306
+ "step": 320
307
+ },
308
+ {
309
+ "epoch": 0.5866666666666667,
310
+ "grad_norm": 1.4757016897201538,
311
+ "learning_rate": 0.00017112232030264818,
312
+ "loss": 2.133,
313
+ "mean_token_accuracy": 0.6147415205836296,
314
+ "num_tokens": 746903.0,
315
+ "step": 330
316
+ },
317
+ {
318
+ "epoch": 0.6044444444444445,
319
+ "grad_norm": 1.4856476783752441,
320
+ "learning_rate": 0.00016986128625472887,
321
+ "loss": 2.1201,
322
+ "mean_token_accuracy": 0.6161383926868439,
323
+ "num_tokens": 768982.0,
324
+ "step": 340
325
+ },
326
+ {
327
+ "epoch": 0.6222222222222222,
328
+ "grad_norm": 1.2596303224563599,
329
+ "learning_rate": 0.0001686002522068096,
330
+ "loss": 2.1392,
331
+ "mean_token_accuracy": 0.6150005847215653,
332
+ "num_tokens": 791061.0,
333
+ "step": 350
334
+ },
335
+ {
336
+ "epoch": 0.64,
337
+ "grad_norm": 1.3324636220932007,
338
+ "learning_rate": 0.00016733921815889028,
339
+ "loss": 2.1201,
340
+ "mean_token_accuracy": 0.6171063780784607,
341
+ "num_tokens": 813112.0,
342
+ "step": 360
343
+ },
344
+ {
345
+ "epoch": 0.6577777777777778,
346
+ "grad_norm": 1.419053316116333,
347
+ "learning_rate": 0.000166078184110971,
348
+ "loss": 2.1237,
349
+ "mean_token_accuracy": 0.6111394688487053,
350
+ "num_tokens": 835469.0,
351
+ "step": 370
352
+ },
353
+ {
354
+ "epoch": 0.6755555555555556,
355
+ "grad_norm": 1.4507274627685547,
356
+ "learning_rate": 0.0001648171500630517,
357
+ "loss": 2.1387,
358
+ "mean_token_accuracy": 0.604290933907032,
359
+ "num_tokens": 857795.0,
360
+ "step": 380
361
+ },
362
+ {
363
+ "epoch": 0.6933333333333334,
364
+ "grad_norm": 1.284505844116211,
365
+ "learning_rate": 0.0001635561160151324,
366
+ "loss": 2.1,
367
+ "mean_token_accuracy": 0.6181465938687325,
368
+ "num_tokens": 879659.0,
369
+ "step": 390
370
+ },
371
+ {
372
+ "epoch": 0.7111111111111111,
373
+ "grad_norm": 1.5179046392440796,
374
+ "learning_rate": 0.00016229508196721312,
375
+ "loss": 2.0813,
376
+ "step": 400
377
+ },
378
+ {
379
+ "epoch": 0.7111111111111111,
380
+ "eval_loss": 2.0953471660614014,
381
+ "eval_mean_token_accuracy": 0.618859866142273,
382
+ "eval_num_tokens": 902240.0,
383
+ "eval_runtime": 30.5153,
384
+ "eval_samples_per_second": 32.77,
385
+ "eval_steps_per_second": 8.193,
386
+ "step": 400
387
+ },
388
+ {
389
+ "epoch": 0.7288888888888889,
390
+ "grad_norm": 1.3377336263656616,
391
+ "learning_rate": 0.0001610340479192938,
392
+ "loss": 2.1049,
393
+ "mean_token_accuracy": 0.6189975582063199,
394
+ "num_tokens": 925091.0,
395
+ "step": 410
396
+ },
397
+ {
398
+ "epoch": 0.7466666666666667,
399
+ "grad_norm": 1.406614065170288,
400
+ "learning_rate": 0.00015977301387137452,
401
+ "loss": 2.1343,
402
+ "mean_token_accuracy": 0.6101128354668617,
403
+ "num_tokens": 948151.0,
404
+ "step": 420
405
+ },
406
+ {
407
+ "epoch": 0.7644444444444445,
408
+ "grad_norm": 1.3494964838027954,
409
+ "learning_rate": 0.00015851197982345524,
410
+ "loss": 2.0506,
411
+ "mean_token_accuracy": 0.6257941454648972,
412
+ "num_tokens": 970339.0,
413
+ "step": 430
414
+ },
415
+ {
416
+ "epoch": 0.7822222222222223,
417
+ "grad_norm": 1.3070355653762817,
418
+ "learning_rate": 0.00015725094577553593,
419
+ "loss": 2.0955,
420
+ "mean_token_accuracy": 0.6162661850452423,
421
+ "num_tokens": 993552.0,
422
+ "step": 440
423
+ },
424
+ {
425
+ "epoch": 0.8,
426
+ "grad_norm": 1.3954617977142334,
427
+ "learning_rate": 0.00015598991172761665,
428
+ "loss": 2.1119,
429
+ "mean_token_accuracy": 0.6154530435800553,
430
+ "num_tokens": 1015564.0,
431
+ "step": 450
432
+ },
433
+ {
434
+ "epoch": 0.8177777777777778,
435
+ "grad_norm": 1.4015129804611206,
436
+ "learning_rate": 0.00015472887767969734,
437
+ "loss": 2.0153,
438
+ "mean_token_accuracy": 0.6296211943030358,
439
+ "num_tokens": 1037721.0,
440
+ "step": 460
441
+ },
442
+ {
443
+ "epoch": 0.8355555555555556,
444
+ "grad_norm": 1.41290283203125,
445
+ "learning_rate": 0.00015346784363177806,
446
+ "loss": 2.0914,
447
+ "mean_token_accuracy": 0.6156619966030121,
448
+ "num_tokens": 1060627.0,
449
+ "step": 470
450
+ },
451
+ {
452
+ "epoch": 0.8533333333333334,
453
+ "grad_norm": 1.3715571165084839,
454
+ "learning_rate": 0.00015220680958385877,
455
+ "loss": 2.0674,
456
+ "mean_token_accuracy": 0.6202241629362106,
457
+ "num_tokens": 1082672.0,
458
+ "step": 480
459
+ },
460
+ {
461
+ "epoch": 0.8711111111111111,
462
+ "grad_norm": 1.3797943592071533,
463
+ "learning_rate": 0.00015094577553593946,
464
+ "loss": 2.0677,
465
+ "mean_token_accuracy": 0.6200241416692733,
466
+ "num_tokens": 1104857.0,
467
+ "step": 490
468
+ },
469
+ {
470
+ "epoch": 0.8888888888888888,
471
+ "grad_norm": 1.3080323934555054,
472
+ "learning_rate": 0.00014968474148802018,
473
+ "loss": 2.068,
474
+ "mean_token_accuracy": 0.618759186565876,
475
+ "num_tokens": 1127612.0,
476
+ "step": 500
477
+ },
478
+ {
479
+ "epoch": 0.9066666666666666,
480
+ "grad_norm": 1.4698944091796875,
481
+ "learning_rate": 0.0001484237074401009,
482
+ "loss": 2.0736,
483
+ "mean_token_accuracy": 0.6208444744348526,
484
+ "num_tokens": 1150411.0,
485
+ "step": 510
486
+ },
487
+ {
488
+ "epoch": 0.9244444444444444,
489
+ "grad_norm": 1.3741239309310913,
490
+ "learning_rate": 0.0001471626733921816,
491
+ "loss": 2.0887,
492
+ "mean_token_accuracy": 0.6161769673228263,
493
+ "num_tokens": 1172683.0,
494
+ "step": 520
495
+ },
496
+ {
497
+ "epoch": 0.9422222222222222,
498
+ "grad_norm": 1.3237783908843994,
499
+ "learning_rate": 0.0001459016393442623,
500
+ "loss": 1.9793,
501
+ "mean_token_accuracy": 0.6360917523503303,
502
+ "num_tokens": 1194160.0,
503
+ "step": 530
504
+ },
505
+ {
506
+ "epoch": 0.96,
507
+ "grad_norm": 1.3243825435638428,
508
+ "learning_rate": 0.000144640605296343,
509
+ "loss": 2.0095,
510
+ "mean_token_accuracy": 0.6338530048727989,
511
+ "num_tokens": 1215760.0,
512
+ "step": 540
513
+ },
514
+ {
515
+ "epoch": 0.9777777777777777,
516
+ "grad_norm": 1.3875395059585571,
517
+ "learning_rate": 0.0001433795712484237,
518
+ "loss": 2.0715,
519
+ "mean_token_accuracy": 0.6245882242918015,
520
+ "num_tokens": 1238191.0,
521
+ "step": 550
522
+ },
523
+ {
524
+ "epoch": 0.9955555555555555,
525
+ "grad_norm": 1.390081524848938,
526
+ "learning_rate": 0.00014211853720050443,
527
+ "loss": 2.0421,
528
+ "mean_token_accuracy": 0.6229756608605385,
529
+ "num_tokens": 1260429.0,
530
+ "step": 560
531
+ },
532
+ {
533
+ "epoch": 1.0124444444444445,
534
+ "grad_norm": 1.2626862525939941,
535
+ "learning_rate": 0.00014085750315258512,
536
+ "loss": 1.9614,
537
+ "mean_token_accuracy": 0.6359066555374547,
538
+ "num_tokens": 1281232.0,
539
+ "step": 570
540
+ },
541
+ {
542
+ "epoch": 1.0302222222222222,
543
+ "grad_norm": 1.3941477537155151,
544
+ "learning_rate": 0.00013959646910466583,
545
+ "loss": 1.8782,
546
+ "mean_token_accuracy": 0.6482988312840462,
547
+ "num_tokens": 1304130.0,
548
+ "step": 580
549
+ },
550
+ {
551
+ "epoch": 1.048,
552
+ "grad_norm": 1.4020227193832397,
553
+ "learning_rate": 0.00013833543505674652,
554
+ "loss": 1.8602,
555
+ "mean_token_accuracy": 0.65641980022192,
556
+ "num_tokens": 1326753.0,
557
+ "step": 590
558
+ },
559
+ {
560
+ "epoch": 1.0657777777777777,
561
+ "grad_norm": 1.285709023475647,
562
+ "learning_rate": 0.00013707440100882724,
563
+ "loss": 1.8661,
564
+ "step": 600
565
+ },
566
+ {
567
+ "epoch": 1.0657777777777777,
568
+ "eval_loss": 2.018383264541626,
569
+ "eval_mean_token_accuracy": 0.6295498251914978,
570
+ "eval_num_tokens": 1348985.0,
571
+ "eval_runtime": 30.5245,
572
+ "eval_samples_per_second": 32.761,
573
+ "eval_steps_per_second": 8.19,
574
+ "step": 600
575
+ },
576
+ {
577
+ "epoch": 1.0835555555555556,
578
+ "grad_norm": 1.2745097875595093,
579
+ "learning_rate": 0.00013581336696090796,
580
+ "loss": 1.8705,
581
+ "mean_token_accuracy": 0.650456714630127,
582
+ "num_tokens": 1371318.0,
583
+ "step": 610
584
+ },
585
+ {
586
+ "epoch": 1.1013333333333333,
587
+ "grad_norm": 1.3518744707107544,
588
+ "learning_rate": 0.00013455233291298865,
589
+ "loss": 1.9056,
590
+ "mean_token_accuracy": 0.6455502569675445,
591
+ "num_tokens": 1393816.0,
592
+ "step": 620
593
+ },
594
+ {
595
+ "epoch": 1.1191111111111112,
596
+ "grad_norm": 1.4413272142410278,
597
+ "learning_rate": 0.00013329129886506937,
598
+ "loss": 1.8994,
599
+ "mean_token_accuracy": 0.6459770023822784,
600
+ "num_tokens": 1416529.0,
601
+ "step": 630
602
+ },
603
+ {
604
+ "epoch": 1.1368888888888888,
605
+ "grad_norm": 1.3811439275741577,
606
+ "learning_rate": 0.00013203026481715006,
607
+ "loss": 1.9138,
608
+ "mean_token_accuracy": 0.6459063500165939,
609
+ "num_tokens": 1438970.0,
610
+ "step": 640
611
+ },
612
+ {
613
+ "epoch": 1.1546666666666667,
614
+ "grad_norm": 1.3642174005508423,
615
+ "learning_rate": 0.00013076923076923077,
616
+ "loss": 1.8892,
617
+ "mean_token_accuracy": 0.6444340243935585,
618
+ "num_tokens": 1461324.0,
619
+ "step": 650
620
+ },
621
+ {
622
+ "epoch": 1.1724444444444444,
623
+ "grad_norm": 1.4544634819030762,
624
+ "learning_rate": 0.0001295081967213115,
625
+ "loss": 1.9248,
626
+ "mean_token_accuracy": 0.6391839399933815,
627
+ "num_tokens": 1484246.0,
628
+ "step": 660
629
+ },
630
+ {
631
+ "epoch": 1.1902222222222223,
632
+ "grad_norm": 1.3715091943740845,
633
+ "learning_rate": 0.00012824716267339218,
634
+ "loss": 1.9105,
635
+ "mean_token_accuracy": 0.6393024668097496,
636
+ "num_tokens": 1507305.0,
637
+ "step": 670
638
+ },
639
+ {
640
+ "epoch": 1.208,
641
+ "grad_norm": 1.3897929191589355,
642
+ "learning_rate": 0.0001269861286254729,
643
+ "loss": 1.8714,
644
+ "mean_token_accuracy": 0.6521286174654961,
645
+ "num_tokens": 1529082.0,
646
+ "step": 680
647
+ },
648
+ {
649
+ "epoch": 1.2257777777777779,
650
+ "grad_norm": 1.3576809167861938,
651
+ "learning_rate": 0.00012572509457755359,
652
+ "loss": 1.8677,
653
+ "mean_token_accuracy": 0.6498221024870873,
654
+ "num_tokens": 1551159.0,
655
+ "step": 690
656
+ },
657
+ {
658
+ "epoch": 1.2435555555555555,
659
+ "grad_norm": 1.3156862258911133,
660
+ "learning_rate": 0.0001244640605296343,
661
+ "loss": 1.8996,
662
+ "mean_token_accuracy": 0.6485181763768196,
663
+ "num_tokens": 1573348.0,
664
+ "step": 700
665
+ },
666
+ {
667
+ "epoch": 1.2613333333333334,
668
+ "grad_norm": 1.4738845825195312,
669
+ "learning_rate": 0.00012320302648171502,
670
+ "loss": 1.8953,
671
+ "mean_token_accuracy": 0.6465991452336312,
672
+ "num_tokens": 1595989.0,
673
+ "step": 710
674
+ },
675
+ {
676
+ "epoch": 1.279111111111111,
677
+ "grad_norm": 1.5254158973693848,
678
+ "learning_rate": 0.00012194199243379571,
679
+ "loss": 1.9236,
680
+ "mean_token_accuracy": 0.6474427729845047,
681
+ "num_tokens": 1617895.0,
682
+ "step": 720
683
+ },
684
+ {
685
+ "epoch": 1.2968888888888888,
686
+ "grad_norm": 1.4867346286773682,
687
+ "learning_rate": 0.00012068095838587643,
688
+ "loss": 1.8766,
689
+ "mean_token_accuracy": 0.6491386488080024,
690
+ "num_tokens": 1640415.0,
691
+ "step": 730
692
+ },
693
+ {
694
+ "epoch": 1.3146666666666667,
695
+ "grad_norm": 1.3776379823684692,
696
+ "learning_rate": 0.00011941992433795712,
697
+ "loss": 1.8644,
698
+ "mean_token_accuracy": 0.6499749347567558,
699
+ "num_tokens": 1662713.0,
700
+ "step": 740
701
+ },
702
+ {
703
+ "epoch": 1.3324444444444445,
704
+ "grad_norm": 1.420027256011963,
705
+ "learning_rate": 0.00011815889029003783,
706
+ "loss": 1.8874,
707
+ "mean_token_accuracy": 0.648992708325386,
708
+ "num_tokens": 1684783.0,
709
+ "step": 750
710
+ },
711
+ {
712
+ "epoch": 1.3502222222222222,
713
+ "grad_norm": 1.356441855430603,
714
+ "learning_rate": 0.00011689785624211855,
715
+ "loss": 1.8937,
716
+ "mean_token_accuracy": 0.6503370434045792,
717
+ "num_tokens": 1706623.0,
718
+ "step": 760
719
+ },
720
+ {
721
+ "epoch": 1.3679999999999999,
722
+ "grad_norm": 1.4901665449142456,
723
+ "learning_rate": 0.00011563682219419924,
724
+ "loss": 1.9094,
725
+ "mean_token_accuracy": 0.6417872324585915,
726
+ "num_tokens": 1729494.0,
727
+ "step": 770
728
+ },
729
+ {
730
+ "epoch": 1.3857777777777778,
731
+ "grad_norm": 1.3679572343826294,
732
+ "learning_rate": 0.00011437578814627996,
733
+ "loss": 1.8841,
734
+ "mean_token_accuracy": 0.6478032737970352,
735
+ "num_tokens": 1752045.0,
736
+ "step": 780
737
+ },
738
+ {
739
+ "epoch": 1.4035555555555557,
740
+ "grad_norm": 1.3518086671829224,
741
+ "learning_rate": 0.00011311475409836065,
742
+ "loss": 1.9021,
743
+ "mean_token_accuracy": 0.6460829824209213,
744
+ "num_tokens": 1775601.0,
745
+ "step": 790
746
+ },
747
+ {
748
+ "epoch": 1.4213333333333333,
749
+ "grad_norm": 1.400870442390442,
750
+ "learning_rate": 0.00011185372005044137,
751
+ "loss": 1.8566,
752
+ "step": 800
753
+ },
754
+ {
755
+ "epoch": 1.4213333333333333,
756
+ "eval_loss": 1.9762645959854126,
757
+ "eval_mean_token_accuracy": 0.6358254022598266,
758
+ "eval_num_tokens": 1798633.0,
759
+ "eval_runtime": 30.7115,
760
+ "eval_samples_per_second": 32.561,
761
+ "eval_steps_per_second": 8.14,
762
+ "step": 800
763
+ },
764
+ {
765
+ "epoch": 1.439111111111111,
766
+ "grad_norm": 1.4487619400024414,
767
+ "learning_rate": 0.00011059268600252208,
768
+ "loss": 1.8417,
769
+ "mean_token_accuracy": 0.6496566243469715,
770
+ "num_tokens": 1820656.0,
771
+ "step": 810
772
+ },
773
+ {
774
+ "epoch": 1.456888888888889,
775
+ "grad_norm": 1.4507944583892822,
776
+ "learning_rate": 0.00010933165195460277,
777
+ "loss": 1.8829,
778
+ "mean_token_accuracy": 0.647446171939373,
779
+ "num_tokens": 1842871.0,
780
+ "step": 820
781
+ },
782
+ {
783
+ "epoch": 1.4746666666666668,
784
+ "grad_norm": 1.3563170433044434,
785
+ "learning_rate": 0.00010807061790668349,
786
+ "loss": 1.8508,
787
+ "mean_token_accuracy": 0.6544376760721207,
788
+ "num_tokens": 1865652.0,
789
+ "step": 830
790
+ },
791
+ {
792
+ "epoch": 1.4924444444444445,
793
+ "grad_norm": 1.366861343383789,
794
+ "learning_rate": 0.00010680958385876418,
795
+ "loss": 1.8756,
796
+ "mean_token_accuracy": 0.648781743645668,
797
+ "num_tokens": 1888455.0,
798
+ "step": 840
799
+ },
800
+ {
801
+ "epoch": 1.5102222222222221,
802
+ "grad_norm": 1.5031019449234009,
803
+ "learning_rate": 0.0001055485498108449,
804
+ "loss": 1.8461,
805
+ "mean_token_accuracy": 0.6583632439374923,
806
+ "num_tokens": 1910676.0,
807
+ "step": 850
808
+ },
809
+ {
810
+ "epoch": 1.528,
811
+ "grad_norm": 1.5248113870620728,
812
+ "learning_rate": 0.00010428751576292561,
813
+ "loss": 1.8857,
814
+ "mean_token_accuracy": 0.6470584884285927,
815
+ "num_tokens": 1933253.0,
816
+ "step": 860
817
+ },
818
+ {
819
+ "epoch": 1.545777777777778,
820
+ "grad_norm": 1.4354236125946045,
821
+ "learning_rate": 0.0001030264817150063,
822
+ "loss": 1.892,
823
+ "mean_token_accuracy": 0.6445729210972786,
824
+ "num_tokens": 1955820.0,
825
+ "step": 870
826
+ },
827
+ {
828
+ "epoch": 1.5635555555555556,
829
+ "grad_norm": 1.4288746118545532,
830
+ "learning_rate": 0.00010176544766708702,
831
+ "loss": 1.878,
832
+ "mean_token_accuracy": 0.6476826578378677,
833
+ "num_tokens": 1978120.0,
834
+ "step": 880
835
+ },
836
+ {
837
+ "epoch": 1.5813333333333333,
838
+ "grad_norm": 1.433902382850647,
839
+ "learning_rate": 0.00010050441361916771,
840
+ "loss": 1.8199,
841
+ "mean_token_accuracy": 0.6561270505189896,
842
+ "num_tokens": 2000508.0,
843
+ "step": 890
844
+ },
845
+ {
846
+ "epoch": 1.5991111111111111,
847
+ "grad_norm": 1.332987904548645,
848
+ "learning_rate": 9.924337957124843e-05,
849
+ "loss": 1.8555,
850
+ "mean_token_accuracy": 0.6499203637242317,
851
+ "num_tokens": 2023356.0,
852
+ "step": 900
853
+ },
854
+ {
855
+ "epoch": 1.616888888888889,
856
+ "grad_norm": 1.3830794095993042,
857
+ "learning_rate": 9.798234552332913e-05,
858
+ "loss": 1.8108,
859
+ "mean_token_accuracy": 0.6618377715349197,
860
+ "num_tokens": 2045965.0,
861
+ "step": 910
862
+ },
863
+ {
864
+ "epoch": 1.6346666666666667,
865
+ "grad_norm": 1.3988080024719238,
866
+ "learning_rate": 9.672131147540983e-05,
867
+ "loss": 1.8791,
868
+ "mean_token_accuracy": 0.6456288158893585,
869
+ "num_tokens": 2069108.0,
870
+ "step": 920
871
+ },
872
+ {
873
+ "epoch": 1.6524444444444444,
874
+ "grad_norm": 1.398549199104309,
875
+ "learning_rate": 9.546027742749055e-05,
876
+ "loss": 1.8885,
877
+ "mean_token_accuracy": 0.6464410901069642,
878
+ "num_tokens": 2091755.0,
879
+ "step": 930
880
+ },
881
+ {
882
+ "epoch": 1.6702222222222223,
883
+ "grad_norm": 1.5381189584732056,
884
+ "learning_rate": 9.419924337957125e-05,
885
+ "loss": 1.853,
886
+ "mean_token_accuracy": 0.6534279838204384,
887
+ "num_tokens": 2114496.0,
888
+ "step": 940
889
+ },
890
+ {
891
+ "epoch": 1.688,
892
+ "grad_norm": 1.4101791381835938,
893
+ "learning_rate": 9.293820933165196e-05,
894
+ "loss": 1.8696,
895
+ "mean_token_accuracy": 0.6475102975964546,
896
+ "num_tokens": 2137041.0,
897
+ "step": 950
898
+ },
899
+ {
900
+ "epoch": 1.7057777777777776,
901
+ "grad_norm": 1.496955156326294,
902
+ "learning_rate": 9.167717528373266e-05,
903
+ "loss": 1.8752,
904
+ "mean_token_accuracy": 0.6469297721982002,
905
+ "num_tokens": 2159711.0,
906
+ "step": 960
907
+ },
908
+ {
909
+ "epoch": 1.7235555555555555,
910
+ "grad_norm": 1.4269644021987915,
911
+ "learning_rate": 9.041614123581336e-05,
912
+ "loss": 1.8773,
913
+ "mean_token_accuracy": 0.6508476585149765,
914
+ "num_tokens": 2181675.0,
915
+ "step": 970
916
+ },
917
+ {
918
+ "epoch": 1.7413333333333334,
919
+ "grad_norm": 1.4438400268554688,
920
+ "learning_rate": 8.915510718789408e-05,
921
+ "loss": 1.8433,
922
+ "mean_token_accuracy": 0.656335887312889,
923
+ "num_tokens": 2204671.0,
924
+ "step": 980
925
+ },
926
+ {
927
+ "epoch": 1.759111111111111,
928
+ "grad_norm": 1.3846147060394287,
929
+ "learning_rate": 8.789407313997479e-05,
930
+ "loss": 1.8649,
931
+ "mean_token_accuracy": 0.6451441869139671,
932
+ "num_tokens": 2227866.0,
933
+ "step": 990
934
+ },
935
+ {
936
+ "epoch": 1.7768888888888887,
937
+ "grad_norm": 1.5432794094085693,
938
+ "learning_rate": 8.663303909205549e-05,
939
+ "loss": 1.8435,
940
+ "step": 1000
941
+ },
942
+ {
943
+ "epoch": 1.7768888888888887,
944
+ "eval_loss": 1.9385051727294922,
945
+ "eval_mean_token_accuracy": 0.6399562013149261,
946
+ "eval_num_tokens": 2250379.0,
947
+ "eval_runtime": 30.7428,
948
+ "eval_samples_per_second": 32.528,
949
+ "eval_steps_per_second": 8.132,
950
+ "step": 1000
951
+ },
952
+ {
953
+ "epoch": 1.7946666666666666,
954
+ "grad_norm": 1.4225345849990845,
955
+ "learning_rate": 8.537200504413619e-05,
956
+ "loss": 1.8767,
957
+ "mean_token_accuracy": 0.6512193940579891,
958
+ "num_tokens": 2272999.0,
959
+ "step": 1010
960
+ },
961
+ {
962
+ "epoch": 1.8124444444444445,
963
+ "grad_norm": 1.3732675313949585,
964
+ "learning_rate": 8.41109709962169e-05,
965
+ "loss": 1.845,
966
+ "mean_token_accuracy": 0.6550421059131623,
967
+ "num_tokens": 2295110.0,
968
+ "step": 1020
969
+ },
970
+ {
971
+ "epoch": 1.8302222222222222,
972
+ "grad_norm": 1.3867266178131104,
973
+ "learning_rate": 8.284993694829761e-05,
974
+ "loss": 1.8236,
975
+ "mean_token_accuracy": 0.6565809994935989,
976
+ "num_tokens": 2317432.0,
977
+ "step": 1030
978
+ },
979
+ {
980
+ "epoch": 1.8479999999999999,
981
+ "grad_norm": 1.3360997438430786,
982
+ "learning_rate": 8.158890290037832e-05,
983
+ "loss": 1.8642,
984
+ "mean_token_accuracy": 0.6463637053966522,
985
+ "num_tokens": 2340305.0,
986
+ "step": 1040
987
+ },
988
+ {
989
+ "epoch": 1.8657777777777778,
990
+ "grad_norm": 1.4467201232910156,
991
+ "learning_rate": 8.032786885245902e-05,
992
+ "loss": 1.8661,
993
+ "mean_token_accuracy": 0.653101560473442,
994
+ "num_tokens": 2362746.0,
995
+ "step": 1050
996
+ },
997
+ {
998
+ "epoch": 1.8835555555555556,
999
+ "grad_norm": 1.3943202495574951,
1000
+ "learning_rate": 7.906683480453972e-05,
1001
+ "loss": 1.8399,
1002
+ "mean_token_accuracy": 0.6510771587491035,
1003
+ "num_tokens": 2385647.0,
1004
+ "step": 1060
1005
+ },
1006
+ {
1007
+ "epoch": 1.9013333333333333,
1008
+ "grad_norm": 1.4589335918426514,
1009
+ "learning_rate": 7.780580075662043e-05,
1010
+ "loss": 1.8522,
1011
+ "mean_token_accuracy": 0.6478627189993859,
1012
+ "num_tokens": 2408732.0,
1013
+ "step": 1070
1014
+ },
1015
+ {
1016
+ "epoch": 1.919111111111111,
1017
+ "grad_norm": 1.5676307678222656,
1018
+ "learning_rate": 7.654476670870114e-05,
1019
+ "loss": 1.8316,
1020
+ "mean_token_accuracy": 0.656819324195385,
1021
+ "num_tokens": 2431137.0,
1022
+ "step": 1080
1023
+ },
1024
+ {
1025
+ "epoch": 1.9368888888888889,
1026
+ "grad_norm": 1.3882263898849487,
1027
+ "learning_rate": 7.528373266078185e-05,
1028
+ "loss": 1.7965,
1029
+ "mean_token_accuracy": 0.6616187065839767,
1030
+ "num_tokens": 2453517.0,
1031
+ "step": 1090
1032
+ },
1033
+ {
1034
+ "epoch": 1.9546666666666668,
1035
+ "grad_norm": 1.5195387601852417,
1036
+ "learning_rate": 7.402269861286255e-05,
1037
+ "loss": 1.8346,
1038
+ "mean_token_accuracy": 0.6537548035383225,
1039
+ "num_tokens": 2475737.0,
1040
+ "step": 1100
1041
+ },
1042
+ {
1043
+ "epoch": 1.9724444444444444,
1044
+ "grad_norm": 1.3485065698623657,
1045
+ "learning_rate": 7.276166456494325e-05,
1046
+ "loss": 1.835,
1047
+ "mean_token_accuracy": 0.6512499779462815,
1048
+ "num_tokens": 2497880.0,
1049
+ "step": 1110
1050
+ },
1051
+ {
1052
+ "epoch": 1.9902222222222221,
1053
+ "grad_norm": 1.5932726860046387,
1054
+ "learning_rate": 7.150063051702396e-05,
1055
+ "loss": 1.8432,
1056
+ "mean_token_accuracy": 0.6523947417736053,
1057
+ "num_tokens": 2519974.0,
1058
+ "step": 1120
1059
+ },
1060
+ {
1061
+ "epoch": 2.007111111111111,
1062
+ "grad_norm": 1.3682020902633667,
1063
+ "learning_rate": 7.023959646910467e-05,
1064
+ "loss": 1.7383,
1065
+ "mean_token_accuracy": 0.6761166123967421,
1066
+ "num_tokens": 2540296.0,
1067
+ "step": 1130
1068
+ },
1069
+ {
1070
+ "epoch": 2.024888888888889,
1071
+ "grad_norm": 1.4417686462402344,
1072
+ "learning_rate": 6.897856242118538e-05,
1073
+ "loss": 1.697,
1074
+ "mean_token_accuracy": 0.6766574695706368,
1075
+ "num_tokens": 2561919.0,
1076
+ "step": 1140
1077
+ },
1078
+ {
1079
+ "epoch": 2.042666666666667,
1080
+ "grad_norm": 1.375542163848877,
1081
+ "learning_rate": 6.771752837326608e-05,
1082
+ "loss": 1.7361,
1083
+ "mean_token_accuracy": 0.6702861517667771,
1084
+ "num_tokens": 2584915.0,
1085
+ "step": 1150
1086
+ },
1087
+ {
1088
+ "epoch": 2.0604444444444443,
1089
+ "grad_norm": 1.4783133268356323,
1090
+ "learning_rate": 6.645649432534678e-05,
1091
+ "loss": 1.6927,
1092
+ "mean_token_accuracy": 0.6790769457817077,
1093
+ "num_tokens": 2606857.0,
1094
+ "step": 1160
1095
+ },
1096
+ {
1097
+ "epoch": 2.078222222222222,
1098
+ "grad_norm": 1.5346624851226807,
1099
+ "learning_rate": 6.519546027742749e-05,
1100
+ "loss": 1.6938,
1101
+ "mean_token_accuracy": 0.6730351656675339,
1102
+ "num_tokens": 2629737.0,
1103
+ "step": 1170
1104
+ },
1105
+ {
1106
+ "epoch": 2.096,
1107
+ "grad_norm": 1.430298089981079,
1108
+ "learning_rate": 6.39344262295082e-05,
1109
+ "loss": 1.6476,
1110
+ "mean_token_accuracy": 0.6835471093654633,
1111
+ "num_tokens": 2651936.0,
1112
+ "step": 1180
1113
+ },
1114
+ {
1115
+ "epoch": 2.113777777777778,
1116
+ "grad_norm": 1.4968252182006836,
1117
+ "learning_rate": 6.267339218158891e-05,
1118
+ "loss": 1.7242,
1119
+ "mean_token_accuracy": 0.6697919353842735,
1120
+ "num_tokens": 2675241.0,
1121
+ "step": 1190
1122
+ },
1123
+ {
1124
+ "epoch": 2.1315555555555554,
1125
+ "grad_norm": 1.3892192840576172,
1126
+ "learning_rate": 6.141235813366961e-05,
1127
+ "loss": 1.6916,
1128
+ "step": 1200
1129
+ },
1130
+ {
1131
+ "epoch": 2.1315555555555554,
1132
+ "eval_loss": 1.934017539024353,
1133
+ "eval_mean_token_accuracy": 0.6421641361713409,
1134
+ "eval_num_tokens": 2698219.0,
1135
+ "eval_runtime": 30.3979,
1136
+ "eval_samples_per_second": 32.897,
1137
+ "eval_steps_per_second": 8.224,
1138
+ "step": 1200
1139
+ },
1140
+ {
1141
+ "epoch": 2.1493333333333333,
1142
+ "grad_norm": 1.4893920421600342,
1143
+ "learning_rate": 6.0151324085750316e-05,
1144
+ "loss": 1.7047,
1145
+ "mean_token_accuracy": 0.6755686655640603,
1146
+ "num_tokens": 2721580.0,
1147
+ "step": 1210
1148
+ },
1149
+ {
1150
+ "epoch": 2.167111111111111,
1151
+ "grad_norm": 1.50564444065094,
1152
+ "learning_rate": 5.889029003783102e-05,
1153
+ "loss": 1.7058,
1154
+ "mean_token_accuracy": 0.6746152400970459,
1155
+ "num_tokens": 2744110.0,
1156
+ "step": 1220
1157
+ },
1158
+ {
1159
+ "epoch": 2.1848888888888887,
1160
+ "grad_norm": 1.461367130279541,
1161
+ "learning_rate": 5.7629255989911736e-05,
1162
+ "loss": 1.684,
1163
+ "mean_token_accuracy": 0.6813082948327065,
1164
+ "num_tokens": 2765844.0,
1165
+ "step": 1230
1166
+ },
1167
+ {
1168
+ "epoch": 2.2026666666666666,
1169
+ "grad_norm": 1.553553819656372,
1170
+ "learning_rate": 5.636822194199244e-05,
1171
+ "loss": 1.6848,
1172
+ "mean_token_accuracy": 0.677204079926014,
1173
+ "num_tokens": 2788070.0,
1174
+ "step": 1240
1175
+ },
1176
+ {
1177
+ "epoch": 2.2204444444444444,
1178
+ "grad_norm": 1.4453001022338867,
1179
+ "learning_rate": 5.510718789407314e-05,
1180
+ "loss": 1.7182,
1181
+ "mean_token_accuracy": 0.6765570789575577,
1182
+ "num_tokens": 2810964.0,
1183
+ "step": 1250
1184
+ },
1185
+ {
1186
+ "epoch": 2.2382222222222223,
1187
+ "grad_norm": 1.5605733394622803,
1188
+ "learning_rate": 5.384615384615385e-05,
1189
+ "loss": 1.6772,
1190
+ "mean_token_accuracy": 0.678339496254921,
1191
+ "num_tokens": 2833176.0,
1192
+ "step": 1260
1193
+ },
1194
+ {
1195
+ "epoch": 2.2560000000000002,
1196
+ "grad_norm": 1.514710783958435,
1197
+ "learning_rate": 5.258511979823455e-05,
1198
+ "loss": 1.7192,
1199
+ "mean_token_accuracy": 0.6710417225956917,
1200
+ "num_tokens": 2855440.0,
1201
+ "step": 1270
1202
+ },
1203
+ {
1204
+ "epoch": 2.2737777777777777,
1205
+ "grad_norm": 1.510834813117981,
1206
+ "learning_rate": 5.132408575031527e-05,
1207
+ "loss": 1.6599,
1208
+ "mean_token_accuracy": 0.6822926640510559,
1209
+ "num_tokens": 2877713.0,
1210
+ "step": 1280
1211
+ },
1212
+ {
1213
+ "epoch": 2.2915555555555556,
1214
+ "grad_norm": 1.3550519943237305,
1215
+ "learning_rate": 5.006305170239597e-05,
1216
+ "loss": 1.7072,
1217
+ "mean_token_accuracy": 0.6754195600748062,
1218
+ "num_tokens": 2899934.0,
1219
+ "step": 1290
1220
+ },
1221
+ {
1222
+ "epoch": 2.3093333333333335,
1223
+ "grad_norm": 1.5602107048034668,
1224
+ "learning_rate": 4.8802017654476674e-05,
1225
+ "loss": 1.7111,
1226
+ "mean_token_accuracy": 0.6702851369976998,
1227
+ "num_tokens": 2923244.0,
1228
+ "step": 1300
1229
+ },
1230
+ {
1231
+ "epoch": 2.327111111111111,
1232
+ "grad_norm": 1.5889501571655273,
1233
+ "learning_rate": 4.754098360655738e-05,
1234
+ "loss": 1.6858,
1235
+ "mean_token_accuracy": 0.6781487062573432,
1236
+ "num_tokens": 2945580.0,
1237
+ "step": 1310
1238
+ },
1239
+ {
1240
+ "epoch": 2.344888888888889,
1241
+ "grad_norm": 1.4793872833251953,
1242
+ "learning_rate": 4.627994955863809e-05,
1243
+ "loss": 1.6799,
1244
+ "mean_token_accuracy": 0.6740961462259293,
1245
+ "num_tokens": 2969470.0,
1246
+ "step": 1320
1247
+ },
1248
+ {
1249
+ "epoch": 2.3626666666666667,
1250
+ "grad_norm": 1.6188234090805054,
1251
+ "learning_rate": 4.501891551071879e-05,
1252
+ "loss": 1.6838,
1253
+ "mean_token_accuracy": 0.6732241719961166,
1254
+ "num_tokens": 2991982.0,
1255
+ "step": 1330
1256
+ },
1257
+ {
1258
+ "epoch": 2.3804444444444446,
1259
+ "grad_norm": 1.474108338356018,
1260
+ "learning_rate": 4.37578814627995e-05,
1261
+ "loss": 1.7024,
1262
+ "mean_token_accuracy": 0.675683145225048,
1263
+ "num_tokens": 3014206.0,
1264
+ "step": 1340
1265
+ },
1266
+ {
1267
+ "epoch": 2.398222222222222,
1268
+ "grad_norm": 1.4645053148269653,
1269
+ "learning_rate": 4.2496847414880205e-05,
1270
+ "loss": 1.6564,
1271
+ "mean_token_accuracy": 0.6787498995661736,
1272
+ "num_tokens": 3036651.0,
1273
+ "step": 1350
1274
+ },
1275
+ {
1276
+ "epoch": 2.416,
1277
+ "grad_norm": 1.498451828956604,
1278
+ "learning_rate": 4.1235813366960915e-05,
1279
+ "loss": 1.694,
1280
+ "mean_token_accuracy": 0.6752896070480346,
1281
+ "num_tokens": 3058745.0,
1282
+ "step": 1360
1283
+ },
1284
+ {
1285
+ "epoch": 2.433777777777778,
1286
+ "grad_norm": 1.5558826923370361,
1287
+ "learning_rate": 3.997477931904162e-05,
1288
+ "loss": 1.7106,
1289
+ "mean_token_accuracy": 0.6742441862821579,
1290
+ "num_tokens": 3081726.0,
1291
+ "step": 1370
1292
+ },
1293
+ {
1294
+ "epoch": 2.4515555555555557,
1295
+ "grad_norm": 1.5872586965560913,
1296
+ "learning_rate": 3.871374527112232e-05,
1297
+ "loss": 1.6848,
1298
+ "mean_token_accuracy": 0.6764188826084137,
1299
+ "num_tokens": 3104292.0,
1300
+ "step": 1380
1301
+ },
1302
+ {
1303
+ "epoch": 2.469333333333333,
1304
+ "grad_norm": 1.551299810409546,
1305
+ "learning_rate": 3.745271122320303e-05,
1306
+ "loss": 1.6909,
1307
+ "mean_token_accuracy": 0.6762390181422233,
1308
+ "num_tokens": 3126334.0,
1309
+ "step": 1390
1310
+ },
1311
+ {
1312
+ "epoch": 2.487111111111111,
1313
+ "grad_norm": 1.57632315158844,
1314
+ "learning_rate": 3.6191677175283736e-05,
1315
+ "loss": 1.7113,
1316
+ "step": 1400
1317
+ },
1318
+ {
1319
+ "epoch": 2.487111111111111,
1320
+ "eval_loss": 1.922593355178833,
1321
+ "eval_mean_token_accuracy": 0.6445384075641633,
1322
+ "eval_num_tokens": 3149845.0,
1323
+ "eval_runtime": 30.0209,
1324
+ "eval_samples_per_second": 33.31,
1325
+ "eval_steps_per_second": 8.328,
1326
+ "step": 1400
1327
+ }
1328
+ ],
1329
+ "logging_steps": 10,
1330
+ "max_steps": 1686,
1331
+ "num_input_tokens_seen": 0,
1332
+ "num_train_epochs": 3,
1333
+ "save_steps": 200,
1334
+ "stateful_callbacks": {
1335
+ "TrainerControl": {
1336
+ "args": {
1337
+ "should_epoch_stop": false,
1338
+ "should_evaluate": false,
1339
+ "should_log": false,
1340
+ "should_save": true,
1341
+ "should_training_stop": false
1342
+ },
1343
+ "attributes": {}
1344
+ }
1345
+ },
1346
+ "total_flos": 1.1090406995533824e+16,
1347
+ "train_batch_size": 4,
1348
+ "trial_name": null,
1349
+ "trial_params": null
1350
+ }
checkpoint-1400/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bbd2538562d29d0ea8a0dc81d11411522bce0862261591b886509bfea955316
3
+ size 5624
checkpoint-1400/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1600/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
checkpoint-1600/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "o_proj",
28
+ "up_proj",
29
+ "k_proj",
30
+ "v_proj",
31
+ "gate_proj",
32
+ "down_proj",
33
+ "q_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
checkpoint-1600/added_tokens.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<EMAIL>": 110521,
3
+ "<KEY>": 110522,
4
+ "<NAME>": 110520,
5
+ "<PASSWORD>": 110523,
6
+ "<code_to_intermediate>": 110502,
7
+ "<empty_output>": 110501,
8
+ "<file_sep>": 110492,
9
+ "<intermediate_to_code>": 110503,
10
+ "<issue_closed>": 110495,
11
+ "<issue_comment>": 110494,
12
+ "<issue_start>": 110493,
13
+ "<jupyter_code>": 110498,
14
+ "<jupyter_output>": 110499,
15
+ "<jupyter_script>": 110500,
16
+ "<jupyter_start>": 110496,
17
+ "<jupyter_text>": 110497,
18
+ "<pr>": 110504,
19
+ "<pr_base>": 110507,
20
+ "<pr_base_code>": 110509,
21
+ "<pr_comment>": 110512,
22
+ "<pr_diff>": 110510,
23
+ "<pr_diff_hunk>": 110511,
24
+ "<pr_diff_hunk_comment_line>": 110519,
25
+ "<pr_event_id>": 110513,
26
+ "<pr_file>": 110508,
27
+ "<pr_in_reply_to_comment_id>": 110518,
28
+ "<pr_in_reply_to_review_id>": 110517,
29
+ "<pr_is_merged>": 110506,
30
+ "<pr_review>": 110514,
31
+ "<pr_review_comment>": 110516,
32
+ "<pr_review_state>": 110515,
33
+ "<pr_status>": 110505,
34
+ "<repo_name>": 110491
35
+ }
checkpoint-1600/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1600/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33bf2d47ce72df45c56a298cbc23459a22e3167c3fe6bd52eb83041ec4acab41
3
+ size 14244
checkpoint-1600/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:815cde735b4c622362408ffa0f2241becccb5c14e3d2eda0785d3633474a48d9
3
+ size 1064
checkpoint-1600/special_tokens_map.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|fim_prefix|>",
5
+ "<|fim_middle|>",
6
+ "<|fim_suffix|>",
7
+ "<|endofprompt|>",
8
+ "<|_unuse_missing_100256|>",
9
+ "<|_unuse_missing_100261|>",
10
+ "<|_unuse_missing_100262|>",
11
+ "<|_unuse_missing_100263|>",
12
+ "<|_unuse_missing_100264|>",
13
+ "<|_unuse_missing_100265|>",
14
+ "<|_unuse_missing_100266|>",
15
+ "<|_unuse_missing_100267|>",
16
+ "<|_unuse_missing_100268|>",
17
+ "<|_unuse_missing_100269|>",
18
+ "<|_unuse_missing_100270|>",
19
+ "<|_unuse_missing_100271|>",
20
+ "<|im_start|>",
21
+ "<|im_end|>",
22
+ "<|stop|>",
23
+ "<|endofturn|>",
24
+ "<repo_name>",
25
+ "<file_sep>",
26
+ "<issue_start>",
27
+ "<issue_comment>",
28
+ "<issue_closed>",
29
+ "<jupyter_start>",
30
+ "<jupyter_text>",
31
+ "<jupyter_code>",
32
+ "<jupyter_output>",
33
+ "<jupyter_script>",
34
+ "<empty_output>",
35
+ "<code_to_intermediate>",
36
+ "<intermediate_to_code>",
37
+ "<pr>",
38
+ "<pr_status>",
39
+ "<pr_is_merged>",
40
+ "<pr_base>",
41
+ "<pr_file>",
42
+ "<pr_base_code>",
43
+ "<pr_diff>",
44
+ "<pr_diff_hunk>",
45
+ "<pr_comment>",
46
+ "<pr_event_id>",
47
+ "<pr_review>",
48
+ "<pr_review_state>",
49
+ "<pr_review_comment>",
50
+ "<pr_in_reply_to_review_id>",
51
+ "<pr_in_reply_to_comment_id>",
52
+ "<pr_diff_hunk_comment_line>",
53
+ "<NAME>",
54
+ "<EMAIL>",
55
+ "<KEY>",
56
+ "<PASSWORD>"
57
+ ],
58
+ "bos_token": {
59
+ "content": "<|endoftext|>",
60
+ "lstrip": false,
61
+ "normalized": false,
62
+ "rstrip": false,
63
+ "single_word": false
64
+ },
65
+ "eos_token": {
66
+ "content": "<|endofturn|>",
67
+ "lstrip": false,
68
+ "normalized": false,
69
+ "rstrip": false,
70
+ "single_word": false
71
+ },
72
+ "pad_token": {
73
+ "content": "<|endoftext|>",
74
+ "lstrip": false,
75
+ "normalized": false,
76
+ "rstrip": false,
77
+ "single_word": false
78
+ },
79
+ "unk_token": {
80
+ "content": "<|endoftext|>",
81
+ "lstrip": false,
82
+ "normalized": false,
83
+ "rstrip": false,
84
+ "single_word": false
85
+ }
86
+ }
checkpoint-1600/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1600/tokenizer_config.json ADDED
@@ -0,0 +1,502 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|_unuse_missing_100256|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "<|_unuse_missing_100261|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "100262": {
53
+ "content": "<|_unuse_missing_100262|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "100263": {
61
+ "content": "<|_unuse_missing_100263|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "100264": {
69
+ "content": "<|_unuse_missing_100264|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|_unuse_missing_100265|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<|_unuse_missing_100266|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "100267": {
93
+ "content": "<|_unuse_missing_100267|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "100268": {
101
+ "content": "<|_unuse_missing_100268|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "100269": {
109
+ "content": "<|_unuse_missing_100269|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "100270": {
117
+ "content": "<|_unuse_missing_100270|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "100271": {
125
+ "content": "<|_unuse_missing_100271|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "100272": {
133
+ "content": "<|im_start|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "100273": {
141
+ "content": "<|im_end|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "100274": {
149
+ "content": "<|stop|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "100275": {
157
+ "content": "<|endofturn|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "110491": {
173
+ "content": "<repo_name>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "110492": {
181
+ "content": "<file_sep>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "110493": {
189
+ "content": "<issue_start>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "110494": {
197
+ "content": "<issue_comment>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "110495": {
205
+ "content": "<issue_closed>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "110496": {
213
+ "content": "<jupyter_start>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "110497": {
221
+ "content": "<jupyter_text>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "110498": {
229
+ "content": "<jupyter_code>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "110499": {
237
+ "content": "<jupyter_output>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "110500": {
245
+ "content": "<jupyter_script>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "110501": {
253
+ "content": "<empty_output>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "110502": {
261
+ "content": "<code_to_intermediate>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "110503": {
269
+ "content": "<intermediate_to_code>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "110504": {
277
+ "content": "<pr>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "110505": {
285
+ "content": "<pr_status>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "110506": {
293
+ "content": "<pr_is_merged>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "110507": {
301
+ "content": "<pr_base>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "110508": {
309
+ "content": "<pr_file>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "110509": {
317
+ "content": "<pr_base_code>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "110510": {
325
+ "content": "<pr_diff>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "110511": {
333
+ "content": "<pr_diff_hunk>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "110512": {
341
+ "content": "<pr_comment>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "110513": {
349
+ "content": "<pr_event_id>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "110514": {
357
+ "content": "<pr_review>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "110515": {
365
+ "content": "<pr_review_state>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "110516": {
373
+ "content": "<pr_review_comment>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "110517": {
381
+ "content": "<pr_in_reply_to_review_id>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "110518": {
389
+ "content": "<pr_in_reply_to_comment_id>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "110519": {
397
+ "content": "<pr_diff_hunk_comment_line>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "110520": {
405
+ "content": "<NAME>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "110521": {
413
+ "content": "<EMAIL>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "110522": {
421
+ "content": "<KEY>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "110523": {
429
+ "content": "<PASSWORD>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ }
436
+ },
437
+ "additional_special_tokens": [
438
+ "<|endoftext|>",
439
+ "<|fim_prefix|>",
440
+ "<|fim_middle|>",
441
+ "<|fim_suffix|>",
442
+ "<|endofprompt|>",
443
+ "<|_unuse_missing_100256|>",
444
+ "<|_unuse_missing_100261|>",
445
+ "<|_unuse_missing_100262|>",
446
+ "<|_unuse_missing_100263|>",
447
+ "<|_unuse_missing_100264|>",
448
+ "<|_unuse_missing_100265|>",
449
+ "<|_unuse_missing_100266|>",
450
+ "<|_unuse_missing_100267|>",
451
+ "<|_unuse_missing_100268|>",
452
+ "<|_unuse_missing_100269|>",
453
+ "<|_unuse_missing_100270|>",
454
+ "<|_unuse_missing_100271|>",
455
+ "<|im_start|>",
456
+ "<|im_end|>",
457
+ "<|stop|>",
458
+ "<|endofturn|>",
459
+ "<repo_name>",
460
+ "<file_sep>",
461
+ "<issue_start>",
462
+ "<issue_comment>",
463
+ "<issue_closed>",
464
+ "<jupyter_start>",
465
+ "<jupyter_text>",
466
+ "<jupyter_code>",
467
+ "<jupyter_output>",
468
+ "<jupyter_script>",
469
+ "<empty_output>",
470
+ "<code_to_intermediate>",
471
+ "<intermediate_to_code>",
472
+ "<pr>",
473
+ "<pr_status>",
474
+ "<pr_is_merged>",
475
+ "<pr_base>",
476
+ "<pr_file>",
477
+ "<pr_base_code>",
478
+ "<pr_diff>",
479
+ "<pr_diff_hunk>",
480
+ "<pr_comment>",
481
+ "<pr_event_id>",
482
+ "<pr_review>",
483
+ "<pr_review_state>",
484
+ "<pr_review_comment>",
485
+ "<pr_in_reply_to_review_id>",
486
+ "<pr_in_reply_to_comment_id>",
487
+ "<pr_diff_hunk_comment_line>",
488
+ "<NAME>",
489
+ "<EMAIL>",
490
+ "<KEY>",
491
+ "<PASSWORD>"
492
+ ],
493
+ "bos_token": "<|endoftext|>",
494
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
495
+ "clean_up_tokenization_spaces": true,
496
+ "eos_token": "<|endofturn|>",
497
+ "extra_special_tokens": {},
498
+ "model_max_length": 1000000000000000019884624838656,
499
+ "pad_token": "<|endoftext|>",
500
+ "tokenizer_class": "GPT2Tokenizer",
501
+ "unk_token": "<|endoftext|>"
502
+ }
checkpoint-1600/trainer_state.json ADDED
@@ -0,0 +1,1538 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1600,
3
+ "best_metric": 1.9110984802246094,
4
+ "best_model_checkpoint": "./hyperclova-deobfuscation-lora/checkpoint-1600",
5
+ "epoch": 2.8426666666666667,
6
+ "eval_steps": 200,
7
+ "global_step": 1600,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.017777777777777778,
14
+ "grad_norm": 3.3687641620635986,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 4.1361,
17
+ "mean_token_accuracy": 0.3493226237595081,
18
+ "num_tokens": 22106.0,
19
+ "step": 10
20
+ },
21
+ {
22
+ "epoch": 0.035555555555555556,
23
+ "grad_norm": 2.5920090675354004,
24
+ "learning_rate": 3.8e-05,
25
+ "loss": 3.7165,
26
+ "mean_token_accuracy": 0.4088538818061352,
27
+ "num_tokens": 44943.0,
28
+ "step": 20
29
+ },
30
+ {
31
+ "epoch": 0.05333333333333334,
32
+ "grad_norm": 2.5703377723693848,
33
+ "learning_rate": 5.8e-05,
34
+ "loss": 3.3356,
35
+ "mean_token_accuracy": 0.4755532510578632,
36
+ "num_tokens": 67397.0,
37
+ "step": 30
38
+ },
39
+ {
40
+ "epoch": 0.07111111111111111,
41
+ "grad_norm": 1.698912262916565,
42
+ "learning_rate": 7.800000000000001e-05,
43
+ "loss": 2.9874,
44
+ "mean_token_accuracy": 0.508383595943451,
45
+ "num_tokens": 89803.0,
46
+ "step": 40
47
+ },
48
+ {
49
+ "epoch": 0.08888888888888889,
50
+ "grad_norm": 1.4602556228637695,
51
+ "learning_rate": 9.8e-05,
52
+ "loss": 2.7854,
53
+ "mean_token_accuracy": 0.5358646497130394,
54
+ "num_tokens": 112364.0,
55
+ "step": 50
56
+ },
57
+ {
58
+ "epoch": 0.10666666666666667,
59
+ "grad_norm": 1.5916705131530762,
60
+ "learning_rate": 0.000118,
61
+ "loss": 2.6546,
62
+ "mean_token_accuracy": 0.5485944993793964,
63
+ "num_tokens": 134028.0,
64
+ "step": 60
65
+ },
66
+ {
67
+ "epoch": 0.12444444444444444,
68
+ "grad_norm": 1.6815338134765625,
69
+ "learning_rate": 0.000138,
70
+ "loss": 2.606,
71
+ "mean_token_accuracy": 0.5535938143730164,
72
+ "num_tokens": 156703.0,
73
+ "step": 70
74
+ },
75
+ {
76
+ "epoch": 0.14222222222222222,
77
+ "grad_norm": 1.8009140491485596,
78
+ "learning_rate": 0.00015800000000000002,
79
+ "loss": 2.5307,
80
+ "mean_token_accuracy": 0.5640750013291835,
81
+ "num_tokens": 178986.0,
82
+ "step": 80
83
+ },
84
+ {
85
+ "epoch": 0.16,
86
+ "grad_norm": 1.4582855701446533,
87
+ "learning_rate": 0.00017800000000000002,
88
+ "loss": 2.5633,
89
+ "mean_token_accuracy": 0.5567230455577373,
90
+ "num_tokens": 201989.0,
91
+ "step": 90
92
+ },
93
+ {
94
+ "epoch": 0.17777777777777778,
95
+ "grad_norm": 1.663874626159668,
96
+ "learning_rate": 0.00019800000000000002,
97
+ "loss": 2.4672,
98
+ "mean_token_accuracy": 0.5688358306884765,
99
+ "num_tokens": 223936.0,
100
+ "step": 100
101
+ },
102
+ {
103
+ "epoch": 0.19555555555555557,
104
+ "grad_norm": 1.6701704263687134,
105
+ "learning_rate": 0.00019886506935687262,
106
+ "loss": 2.4388,
107
+ "mean_token_accuracy": 0.5760447531938553,
108
+ "num_tokens": 246101.0,
109
+ "step": 110
110
+ },
111
+ {
112
+ "epoch": 0.21333333333333335,
113
+ "grad_norm": 1.5731302499771118,
114
+ "learning_rate": 0.00019760403530895334,
115
+ "loss": 2.4377,
116
+ "mean_token_accuracy": 0.5711787067353725,
117
+ "num_tokens": 269187.0,
118
+ "step": 120
119
+ },
120
+ {
121
+ "epoch": 0.2311111111111111,
122
+ "grad_norm": 1.4479353427886963,
123
+ "learning_rate": 0.00019634300126103406,
124
+ "loss": 2.3596,
125
+ "mean_token_accuracy": 0.5830569051206111,
126
+ "num_tokens": 291454.0,
127
+ "step": 130
128
+ },
129
+ {
130
+ "epoch": 0.24888888888888888,
131
+ "grad_norm": 1.3653457164764404,
132
+ "learning_rate": 0.00019508196721311475,
133
+ "loss": 2.3648,
134
+ "mean_token_accuracy": 0.5807973451912403,
135
+ "num_tokens": 314204.0,
136
+ "step": 140
137
+ },
138
+ {
139
+ "epoch": 0.26666666666666666,
140
+ "grad_norm": 1.4210327863693237,
141
+ "learning_rate": 0.00019382093316519546,
142
+ "loss": 2.3186,
143
+ "mean_token_accuracy": 0.5878118917346,
144
+ "num_tokens": 337167.0,
145
+ "step": 150
146
+ },
147
+ {
148
+ "epoch": 0.28444444444444444,
149
+ "grad_norm": 1.532408356666565,
150
+ "learning_rate": 0.00019255989911727615,
151
+ "loss": 2.3637,
152
+ "mean_token_accuracy": 0.5761628717184066,
153
+ "num_tokens": 360272.0,
154
+ "step": 160
155
+ },
156
+ {
157
+ "epoch": 0.3022222222222222,
158
+ "grad_norm": 1.4010679721832275,
159
+ "learning_rate": 0.00019129886506935687,
160
+ "loss": 2.2701,
161
+ "mean_token_accuracy": 0.598077318072319,
162
+ "num_tokens": 382779.0,
163
+ "step": 170
164
+ },
165
+ {
166
+ "epoch": 0.32,
167
+ "grad_norm": 1.5830323696136475,
168
+ "learning_rate": 0.0001900378310214376,
169
+ "loss": 2.2861,
170
+ "mean_token_accuracy": 0.5928302705287933,
171
+ "num_tokens": 405438.0,
172
+ "step": 180
173
+ },
174
+ {
175
+ "epoch": 0.3377777777777778,
176
+ "grad_norm": 1.4623483419418335,
177
+ "learning_rate": 0.00018877679697351828,
178
+ "loss": 2.3192,
179
+ "mean_token_accuracy": 0.5854370579123497,
180
+ "num_tokens": 428660.0,
181
+ "step": 190
182
+ },
183
+ {
184
+ "epoch": 0.35555555555555557,
185
+ "grad_norm": 1.4850527048110962,
186
+ "learning_rate": 0.000187515762925599,
187
+ "loss": 2.256,
188
+ "step": 200
189
+ },
190
+ {
191
+ "epoch": 0.35555555555555557,
192
+ "eval_loss": 2.254753351211548,
193
+ "eval_mean_token_accuracy": 0.5965457199811935,
194
+ "eval_num_tokens": 450808.0,
195
+ "eval_runtime": 30.9386,
196
+ "eval_samples_per_second": 32.322,
197
+ "eval_steps_per_second": 8.081,
198
+ "step": 200
199
+ },
200
+ {
201
+ "epoch": 0.37333333333333335,
202
+ "grad_norm": 1.4195237159729004,
203
+ "learning_rate": 0.00018625472887767968,
204
+ "loss": 2.2607,
205
+ "mean_token_accuracy": 0.594056948274374,
206
+ "num_tokens": 473434.0,
207
+ "step": 210
208
+ },
209
+ {
210
+ "epoch": 0.39111111111111113,
211
+ "grad_norm": 1.3114796876907349,
212
+ "learning_rate": 0.0001849936948297604,
213
+ "loss": 2.2947,
214
+ "mean_token_accuracy": 0.5898103177547455,
215
+ "num_tokens": 496482.0,
216
+ "step": 220
217
+ },
218
+ {
219
+ "epoch": 0.4088888888888889,
220
+ "grad_norm": 1.4004285335540771,
221
+ "learning_rate": 0.00018373266078184112,
222
+ "loss": 2.2542,
223
+ "mean_token_accuracy": 0.5970372915267944,
224
+ "num_tokens": 519379.0,
225
+ "step": 230
226
+ },
227
+ {
228
+ "epoch": 0.4266666666666667,
229
+ "grad_norm": 1.3860116004943848,
230
+ "learning_rate": 0.0001824716267339218,
231
+ "loss": 2.2636,
232
+ "mean_token_accuracy": 0.59425338357687,
233
+ "num_tokens": 542631.0,
234
+ "step": 240
235
+ },
236
+ {
237
+ "epoch": 0.4444444444444444,
238
+ "grad_norm": 1.3675146102905273,
239
+ "learning_rate": 0.00018121059268600253,
240
+ "loss": 2.2412,
241
+ "mean_token_accuracy": 0.5928545072674751,
242
+ "num_tokens": 565400.0,
243
+ "step": 250
244
+ },
245
+ {
246
+ "epoch": 0.4622222222222222,
247
+ "grad_norm": 1.4246889352798462,
248
+ "learning_rate": 0.00017994955863808322,
249
+ "loss": 2.1577,
250
+ "mean_token_accuracy": 0.6061514511704444,
251
+ "num_tokens": 588003.0,
252
+ "step": 260
253
+ },
254
+ {
255
+ "epoch": 0.48,
256
+ "grad_norm": 1.4046531915664673,
257
+ "learning_rate": 0.00017868852459016393,
258
+ "loss": 2.1862,
259
+ "mean_token_accuracy": 0.6008762732148171,
260
+ "num_tokens": 610974.0,
261
+ "step": 270
262
+ },
263
+ {
264
+ "epoch": 0.49777777777777776,
265
+ "grad_norm": 1.4038338661193848,
266
+ "learning_rate": 0.00017742749054224465,
267
+ "loss": 2.2219,
268
+ "mean_token_accuracy": 0.5970636487007142,
269
+ "num_tokens": 634093.0,
270
+ "step": 280
271
+ },
272
+ {
273
+ "epoch": 0.5155555555555555,
274
+ "grad_norm": 1.3291988372802734,
275
+ "learning_rate": 0.00017616645649432534,
276
+ "loss": 2.131,
277
+ "mean_token_accuracy": 0.6172704175114632,
278
+ "num_tokens": 656188.0,
279
+ "step": 290
280
+ },
281
+ {
282
+ "epoch": 0.5333333333333333,
283
+ "grad_norm": 1.444318413734436,
284
+ "learning_rate": 0.00017490542244640606,
285
+ "loss": 2.1691,
286
+ "mean_token_accuracy": 0.6066021353006363,
287
+ "num_tokens": 678769.0,
288
+ "step": 300
289
+ },
290
+ {
291
+ "epoch": 0.5511111111111111,
292
+ "grad_norm": 1.3459752798080444,
293
+ "learning_rate": 0.00017364438839848675,
294
+ "loss": 2.1413,
295
+ "mean_token_accuracy": 0.6139265760779381,
296
+ "num_tokens": 701734.0,
297
+ "step": 310
298
+ },
299
+ {
300
+ "epoch": 0.5688888888888889,
301
+ "grad_norm": 1.3597490787506104,
302
+ "learning_rate": 0.00017238335435056746,
303
+ "loss": 2.1271,
304
+ "mean_token_accuracy": 0.6106095433235168,
305
+ "num_tokens": 724815.0,
306
+ "step": 320
307
+ },
308
+ {
309
+ "epoch": 0.5866666666666667,
310
+ "grad_norm": 1.4757016897201538,
311
+ "learning_rate": 0.00017112232030264818,
312
+ "loss": 2.133,
313
+ "mean_token_accuracy": 0.6147415205836296,
314
+ "num_tokens": 746903.0,
315
+ "step": 330
316
+ },
317
+ {
318
+ "epoch": 0.6044444444444445,
319
+ "grad_norm": 1.4856476783752441,
320
+ "learning_rate": 0.00016986128625472887,
321
+ "loss": 2.1201,
322
+ "mean_token_accuracy": 0.6161383926868439,
323
+ "num_tokens": 768982.0,
324
+ "step": 340
325
+ },
326
+ {
327
+ "epoch": 0.6222222222222222,
328
+ "grad_norm": 1.2596303224563599,
329
+ "learning_rate": 0.0001686002522068096,
330
+ "loss": 2.1392,
331
+ "mean_token_accuracy": 0.6150005847215653,
332
+ "num_tokens": 791061.0,
333
+ "step": 350
334
+ },
335
+ {
336
+ "epoch": 0.64,
337
+ "grad_norm": 1.3324636220932007,
338
+ "learning_rate": 0.00016733921815889028,
339
+ "loss": 2.1201,
340
+ "mean_token_accuracy": 0.6171063780784607,
341
+ "num_tokens": 813112.0,
342
+ "step": 360
343
+ },
344
+ {
345
+ "epoch": 0.6577777777777778,
346
+ "grad_norm": 1.419053316116333,
347
+ "learning_rate": 0.000166078184110971,
348
+ "loss": 2.1237,
349
+ "mean_token_accuracy": 0.6111394688487053,
350
+ "num_tokens": 835469.0,
351
+ "step": 370
352
+ },
353
+ {
354
+ "epoch": 0.6755555555555556,
355
+ "grad_norm": 1.4507274627685547,
356
+ "learning_rate": 0.0001648171500630517,
357
+ "loss": 2.1387,
358
+ "mean_token_accuracy": 0.604290933907032,
359
+ "num_tokens": 857795.0,
360
+ "step": 380
361
+ },
362
+ {
363
+ "epoch": 0.6933333333333334,
364
+ "grad_norm": 1.284505844116211,
365
+ "learning_rate": 0.0001635561160151324,
366
+ "loss": 2.1,
367
+ "mean_token_accuracy": 0.6181465938687325,
368
+ "num_tokens": 879659.0,
369
+ "step": 390
370
+ },
371
+ {
372
+ "epoch": 0.7111111111111111,
373
+ "grad_norm": 1.5179046392440796,
374
+ "learning_rate": 0.00016229508196721312,
375
+ "loss": 2.0813,
376
+ "step": 400
377
+ },
378
+ {
379
+ "epoch": 0.7111111111111111,
380
+ "eval_loss": 2.0953471660614014,
381
+ "eval_mean_token_accuracy": 0.618859866142273,
382
+ "eval_num_tokens": 902240.0,
383
+ "eval_runtime": 30.5153,
384
+ "eval_samples_per_second": 32.77,
385
+ "eval_steps_per_second": 8.193,
386
+ "step": 400
387
+ },
388
+ {
389
+ "epoch": 0.7288888888888889,
390
+ "grad_norm": 1.3377336263656616,
391
+ "learning_rate": 0.0001610340479192938,
392
+ "loss": 2.1049,
393
+ "mean_token_accuracy": 0.6189975582063199,
394
+ "num_tokens": 925091.0,
395
+ "step": 410
396
+ },
397
+ {
398
+ "epoch": 0.7466666666666667,
399
+ "grad_norm": 1.406614065170288,
400
+ "learning_rate": 0.00015977301387137452,
401
+ "loss": 2.1343,
402
+ "mean_token_accuracy": 0.6101128354668617,
403
+ "num_tokens": 948151.0,
404
+ "step": 420
405
+ },
406
+ {
407
+ "epoch": 0.7644444444444445,
408
+ "grad_norm": 1.3494964838027954,
409
+ "learning_rate": 0.00015851197982345524,
410
+ "loss": 2.0506,
411
+ "mean_token_accuracy": 0.6257941454648972,
412
+ "num_tokens": 970339.0,
413
+ "step": 430
414
+ },
415
+ {
416
+ "epoch": 0.7822222222222223,
417
+ "grad_norm": 1.3070355653762817,
418
+ "learning_rate": 0.00015725094577553593,
419
+ "loss": 2.0955,
420
+ "mean_token_accuracy": 0.6162661850452423,
421
+ "num_tokens": 993552.0,
422
+ "step": 440
423
+ },
424
+ {
425
+ "epoch": 0.8,
426
+ "grad_norm": 1.3954617977142334,
427
+ "learning_rate": 0.00015598991172761665,
428
+ "loss": 2.1119,
429
+ "mean_token_accuracy": 0.6154530435800553,
430
+ "num_tokens": 1015564.0,
431
+ "step": 450
432
+ },
433
+ {
434
+ "epoch": 0.8177777777777778,
435
+ "grad_norm": 1.4015129804611206,
436
+ "learning_rate": 0.00015472887767969734,
437
+ "loss": 2.0153,
438
+ "mean_token_accuracy": 0.6296211943030358,
439
+ "num_tokens": 1037721.0,
440
+ "step": 460
441
+ },
442
+ {
443
+ "epoch": 0.8355555555555556,
444
+ "grad_norm": 1.41290283203125,
445
+ "learning_rate": 0.00015346784363177806,
446
+ "loss": 2.0914,
447
+ "mean_token_accuracy": 0.6156619966030121,
448
+ "num_tokens": 1060627.0,
449
+ "step": 470
450
+ },
451
+ {
452
+ "epoch": 0.8533333333333334,
453
+ "grad_norm": 1.3715571165084839,
454
+ "learning_rate": 0.00015220680958385877,
455
+ "loss": 2.0674,
456
+ "mean_token_accuracy": 0.6202241629362106,
457
+ "num_tokens": 1082672.0,
458
+ "step": 480
459
+ },
460
+ {
461
+ "epoch": 0.8711111111111111,
462
+ "grad_norm": 1.3797943592071533,
463
+ "learning_rate": 0.00015094577553593946,
464
+ "loss": 2.0677,
465
+ "mean_token_accuracy": 0.6200241416692733,
466
+ "num_tokens": 1104857.0,
467
+ "step": 490
468
+ },
469
+ {
470
+ "epoch": 0.8888888888888888,
471
+ "grad_norm": 1.3080323934555054,
472
+ "learning_rate": 0.00014968474148802018,
473
+ "loss": 2.068,
474
+ "mean_token_accuracy": 0.618759186565876,
475
+ "num_tokens": 1127612.0,
476
+ "step": 500
477
+ },
478
+ {
479
+ "epoch": 0.9066666666666666,
480
+ "grad_norm": 1.4698944091796875,
481
+ "learning_rate": 0.0001484237074401009,
482
+ "loss": 2.0736,
483
+ "mean_token_accuracy": 0.6208444744348526,
484
+ "num_tokens": 1150411.0,
485
+ "step": 510
486
+ },
487
+ {
488
+ "epoch": 0.9244444444444444,
489
+ "grad_norm": 1.3741239309310913,
490
+ "learning_rate": 0.0001471626733921816,
491
+ "loss": 2.0887,
492
+ "mean_token_accuracy": 0.6161769673228263,
493
+ "num_tokens": 1172683.0,
494
+ "step": 520
495
+ },
496
+ {
497
+ "epoch": 0.9422222222222222,
498
+ "grad_norm": 1.3237783908843994,
499
+ "learning_rate": 0.0001459016393442623,
500
+ "loss": 1.9793,
501
+ "mean_token_accuracy": 0.6360917523503303,
502
+ "num_tokens": 1194160.0,
503
+ "step": 530
504
+ },
505
+ {
506
+ "epoch": 0.96,
507
+ "grad_norm": 1.3243825435638428,
508
+ "learning_rate": 0.000144640605296343,
509
+ "loss": 2.0095,
510
+ "mean_token_accuracy": 0.6338530048727989,
511
+ "num_tokens": 1215760.0,
512
+ "step": 540
513
+ },
514
+ {
515
+ "epoch": 0.9777777777777777,
516
+ "grad_norm": 1.3875395059585571,
517
+ "learning_rate": 0.0001433795712484237,
518
+ "loss": 2.0715,
519
+ "mean_token_accuracy": 0.6245882242918015,
520
+ "num_tokens": 1238191.0,
521
+ "step": 550
522
+ },
523
+ {
524
+ "epoch": 0.9955555555555555,
525
+ "grad_norm": 1.390081524848938,
526
+ "learning_rate": 0.00014211853720050443,
527
+ "loss": 2.0421,
528
+ "mean_token_accuracy": 0.6229756608605385,
529
+ "num_tokens": 1260429.0,
530
+ "step": 560
531
+ },
532
+ {
533
+ "epoch": 1.0124444444444445,
534
+ "grad_norm": 1.2626862525939941,
535
+ "learning_rate": 0.00014085750315258512,
536
+ "loss": 1.9614,
537
+ "mean_token_accuracy": 0.6359066555374547,
538
+ "num_tokens": 1281232.0,
539
+ "step": 570
540
+ },
541
+ {
542
+ "epoch": 1.0302222222222222,
543
+ "grad_norm": 1.3941477537155151,
544
+ "learning_rate": 0.00013959646910466583,
545
+ "loss": 1.8782,
546
+ "mean_token_accuracy": 0.6482988312840462,
547
+ "num_tokens": 1304130.0,
548
+ "step": 580
549
+ },
550
+ {
551
+ "epoch": 1.048,
552
+ "grad_norm": 1.4020227193832397,
553
+ "learning_rate": 0.00013833543505674652,
554
+ "loss": 1.8602,
555
+ "mean_token_accuracy": 0.65641980022192,
556
+ "num_tokens": 1326753.0,
557
+ "step": 590
558
+ },
559
+ {
560
+ "epoch": 1.0657777777777777,
561
+ "grad_norm": 1.285709023475647,
562
+ "learning_rate": 0.00013707440100882724,
563
+ "loss": 1.8661,
564
+ "step": 600
565
+ },
566
+ {
567
+ "epoch": 1.0657777777777777,
568
+ "eval_loss": 2.018383264541626,
569
+ "eval_mean_token_accuracy": 0.6295498251914978,
570
+ "eval_num_tokens": 1348985.0,
571
+ "eval_runtime": 30.5245,
572
+ "eval_samples_per_second": 32.761,
573
+ "eval_steps_per_second": 8.19,
574
+ "step": 600
575
+ },
576
+ {
577
+ "epoch": 1.0835555555555556,
578
+ "grad_norm": 1.2745097875595093,
579
+ "learning_rate": 0.00013581336696090796,
580
+ "loss": 1.8705,
581
+ "mean_token_accuracy": 0.650456714630127,
582
+ "num_tokens": 1371318.0,
583
+ "step": 610
584
+ },
585
+ {
586
+ "epoch": 1.1013333333333333,
587
+ "grad_norm": 1.3518744707107544,
588
+ "learning_rate": 0.00013455233291298865,
589
+ "loss": 1.9056,
590
+ "mean_token_accuracy": 0.6455502569675445,
591
+ "num_tokens": 1393816.0,
592
+ "step": 620
593
+ },
594
+ {
595
+ "epoch": 1.1191111111111112,
596
+ "grad_norm": 1.4413272142410278,
597
+ "learning_rate": 0.00013329129886506937,
598
+ "loss": 1.8994,
599
+ "mean_token_accuracy": 0.6459770023822784,
600
+ "num_tokens": 1416529.0,
601
+ "step": 630
602
+ },
603
+ {
604
+ "epoch": 1.1368888888888888,
605
+ "grad_norm": 1.3811439275741577,
606
+ "learning_rate": 0.00013203026481715006,
607
+ "loss": 1.9138,
608
+ "mean_token_accuracy": 0.6459063500165939,
609
+ "num_tokens": 1438970.0,
610
+ "step": 640
611
+ },
612
+ {
613
+ "epoch": 1.1546666666666667,
614
+ "grad_norm": 1.3642174005508423,
615
+ "learning_rate": 0.00013076923076923077,
616
+ "loss": 1.8892,
617
+ "mean_token_accuracy": 0.6444340243935585,
618
+ "num_tokens": 1461324.0,
619
+ "step": 650
620
+ },
621
+ {
622
+ "epoch": 1.1724444444444444,
623
+ "grad_norm": 1.4544634819030762,
624
+ "learning_rate": 0.0001295081967213115,
625
+ "loss": 1.9248,
626
+ "mean_token_accuracy": 0.6391839399933815,
627
+ "num_tokens": 1484246.0,
628
+ "step": 660
629
+ },
630
+ {
631
+ "epoch": 1.1902222222222223,
632
+ "grad_norm": 1.3715091943740845,
633
+ "learning_rate": 0.00012824716267339218,
634
+ "loss": 1.9105,
635
+ "mean_token_accuracy": 0.6393024668097496,
636
+ "num_tokens": 1507305.0,
637
+ "step": 670
638
+ },
639
+ {
640
+ "epoch": 1.208,
641
+ "grad_norm": 1.3897929191589355,
642
+ "learning_rate": 0.0001269861286254729,
643
+ "loss": 1.8714,
644
+ "mean_token_accuracy": 0.6521286174654961,
645
+ "num_tokens": 1529082.0,
646
+ "step": 680
647
+ },
648
+ {
649
+ "epoch": 1.2257777777777779,
650
+ "grad_norm": 1.3576809167861938,
651
+ "learning_rate": 0.00012572509457755359,
652
+ "loss": 1.8677,
653
+ "mean_token_accuracy": 0.6498221024870873,
654
+ "num_tokens": 1551159.0,
655
+ "step": 690
656
+ },
657
+ {
658
+ "epoch": 1.2435555555555555,
659
+ "grad_norm": 1.3156862258911133,
660
+ "learning_rate": 0.0001244640605296343,
661
+ "loss": 1.8996,
662
+ "mean_token_accuracy": 0.6485181763768196,
663
+ "num_tokens": 1573348.0,
664
+ "step": 700
665
+ },
666
+ {
667
+ "epoch": 1.2613333333333334,
668
+ "grad_norm": 1.4738845825195312,
669
+ "learning_rate": 0.00012320302648171502,
670
+ "loss": 1.8953,
671
+ "mean_token_accuracy": 0.6465991452336312,
672
+ "num_tokens": 1595989.0,
673
+ "step": 710
674
+ },
675
+ {
676
+ "epoch": 1.279111111111111,
677
+ "grad_norm": 1.5254158973693848,
678
+ "learning_rate": 0.00012194199243379571,
679
+ "loss": 1.9236,
680
+ "mean_token_accuracy": 0.6474427729845047,
681
+ "num_tokens": 1617895.0,
682
+ "step": 720
683
+ },
684
+ {
685
+ "epoch": 1.2968888888888888,
686
+ "grad_norm": 1.4867346286773682,
687
+ "learning_rate": 0.00012068095838587643,
688
+ "loss": 1.8766,
689
+ "mean_token_accuracy": 0.6491386488080024,
690
+ "num_tokens": 1640415.0,
691
+ "step": 730
692
+ },
693
+ {
694
+ "epoch": 1.3146666666666667,
695
+ "grad_norm": 1.3776379823684692,
696
+ "learning_rate": 0.00011941992433795712,
697
+ "loss": 1.8644,
698
+ "mean_token_accuracy": 0.6499749347567558,
699
+ "num_tokens": 1662713.0,
700
+ "step": 740
701
+ },
702
+ {
703
+ "epoch": 1.3324444444444445,
704
+ "grad_norm": 1.420027256011963,
705
+ "learning_rate": 0.00011815889029003783,
706
+ "loss": 1.8874,
707
+ "mean_token_accuracy": 0.648992708325386,
708
+ "num_tokens": 1684783.0,
709
+ "step": 750
710
+ },
711
+ {
712
+ "epoch": 1.3502222222222222,
713
+ "grad_norm": 1.356441855430603,
714
+ "learning_rate": 0.00011689785624211855,
715
+ "loss": 1.8937,
716
+ "mean_token_accuracy": 0.6503370434045792,
717
+ "num_tokens": 1706623.0,
718
+ "step": 760
719
+ },
720
+ {
721
+ "epoch": 1.3679999999999999,
722
+ "grad_norm": 1.4901665449142456,
723
+ "learning_rate": 0.00011563682219419924,
724
+ "loss": 1.9094,
725
+ "mean_token_accuracy": 0.6417872324585915,
726
+ "num_tokens": 1729494.0,
727
+ "step": 770
728
+ },
729
+ {
730
+ "epoch": 1.3857777777777778,
731
+ "grad_norm": 1.3679572343826294,
732
+ "learning_rate": 0.00011437578814627996,
733
+ "loss": 1.8841,
734
+ "mean_token_accuracy": 0.6478032737970352,
735
+ "num_tokens": 1752045.0,
736
+ "step": 780
737
+ },
738
+ {
739
+ "epoch": 1.4035555555555557,
740
+ "grad_norm": 1.3518086671829224,
741
+ "learning_rate": 0.00011311475409836065,
742
+ "loss": 1.9021,
743
+ "mean_token_accuracy": 0.6460829824209213,
744
+ "num_tokens": 1775601.0,
745
+ "step": 790
746
+ },
747
+ {
748
+ "epoch": 1.4213333333333333,
749
+ "grad_norm": 1.400870442390442,
750
+ "learning_rate": 0.00011185372005044137,
751
+ "loss": 1.8566,
752
+ "step": 800
753
+ },
754
+ {
755
+ "epoch": 1.4213333333333333,
756
+ "eval_loss": 1.9762645959854126,
757
+ "eval_mean_token_accuracy": 0.6358254022598266,
758
+ "eval_num_tokens": 1798633.0,
759
+ "eval_runtime": 30.7115,
760
+ "eval_samples_per_second": 32.561,
761
+ "eval_steps_per_second": 8.14,
762
+ "step": 800
763
+ },
764
+ {
765
+ "epoch": 1.439111111111111,
766
+ "grad_norm": 1.4487619400024414,
767
+ "learning_rate": 0.00011059268600252208,
768
+ "loss": 1.8417,
769
+ "mean_token_accuracy": 0.6496566243469715,
770
+ "num_tokens": 1820656.0,
771
+ "step": 810
772
+ },
773
+ {
774
+ "epoch": 1.456888888888889,
775
+ "grad_norm": 1.4507944583892822,
776
+ "learning_rate": 0.00010933165195460277,
777
+ "loss": 1.8829,
778
+ "mean_token_accuracy": 0.647446171939373,
779
+ "num_tokens": 1842871.0,
780
+ "step": 820
781
+ },
782
+ {
783
+ "epoch": 1.4746666666666668,
784
+ "grad_norm": 1.3563170433044434,
785
+ "learning_rate": 0.00010807061790668349,
786
+ "loss": 1.8508,
787
+ "mean_token_accuracy": 0.6544376760721207,
788
+ "num_tokens": 1865652.0,
789
+ "step": 830
790
+ },
791
+ {
792
+ "epoch": 1.4924444444444445,
793
+ "grad_norm": 1.366861343383789,
794
+ "learning_rate": 0.00010680958385876418,
795
+ "loss": 1.8756,
796
+ "mean_token_accuracy": 0.648781743645668,
797
+ "num_tokens": 1888455.0,
798
+ "step": 840
799
+ },
800
+ {
801
+ "epoch": 1.5102222222222221,
802
+ "grad_norm": 1.5031019449234009,
803
+ "learning_rate": 0.0001055485498108449,
804
+ "loss": 1.8461,
805
+ "mean_token_accuracy": 0.6583632439374923,
806
+ "num_tokens": 1910676.0,
807
+ "step": 850
808
+ },
809
+ {
810
+ "epoch": 1.528,
811
+ "grad_norm": 1.5248113870620728,
812
+ "learning_rate": 0.00010428751576292561,
813
+ "loss": 1.8857,
814
+ "mean_token_accuracy": 0.6470584884285927,
815
+ "num_tokens": 1933253.0,
816
+ "step": 860
817
+ },
818
+ {
819
+ "epoch": 1.545777777777778,
820
+ "grad_norm": 1.4354236125946045,
821
+ "learning_rate": 0.0001030264817150063,
822
+ "loss": 1.892,
823
+ "mean_token_accuracy": 0.6445729210972786,
824
+ "num_tokens": 1955820.0,
825
+ "step": 870
826
+ },
827
+ {
828
+ "epoch": 1.5635555555555556,
829
+ "grad_norm": 1.4288746118545532,
830
+ "learning_rate": 0.00010176544766708702,
831
+ "loss": 1.878,
832
+ "mean_token_accuracy": 0.6476826578378677,
833
+ "num_tokens": 1978120.0,
834
+ "step": 880
835
+ },
836
+ {
837
+ "epoch": 1.5813333333333333,
838
+ "grad_norm": 1.433902382850647,
839
+ "learning_rate": 0.00010050441361916771,
840
+ "loss": 1.8199,
841
+ "mean_token_accuracy": 0.6561270505189896,
842
+ "num_tokens": 2000508.0,
843
+ "step": 890
844
+ },
845
+ {
846
+ "epoch": 1.5991111111111111,
847
+ "grad_norm": 1.332987904548645,
848
+ "learning_rate": 9.924337957124843e-05,
849
+ "loss": 1.8555,
850
+ "mean_token_accuracy": 0.6499203637242317,
851
+ "num_tokens": 2023356.0,
852
+ "step": 900
853
+ },
854
+ {
855
+ "epoch": 1.616888888888889,
856
+ "grad_norm": 1.3830794095993042,
857
+ "learning_rate": 9.798234552332913e-05,
858
+ "loss": 1.8108,
859
+ "mean_token_accuracy": 0.6618377715349197,
860
+ "num_tokens": 2045965.0,
861
+ "step": 910
862
+ },
863
+ {
864
+ "epoch": 1.6346666666666667,
865
+ "grad_norm": 1.3988080024719238,
866
+ "learning_rate": 9.672131147540983e-05,
867
+ "loss": 1.8791,
868
+ "mean_token_accuracy": 0.6456288158893585,
869
+ "num_tokens": 2069108.0,
870
+ "step": 920
871
+ },
872
+ {
873
+ "epoch": 1.6524444444444444,
874
+ "grad_norm": 1.398549199104309,
875
+ "learning_rate": 9.546027742749055e-05,
876
+ "loss": 1.8885,
877
+ "mean_token_accuracy": 0.6464410901069642,
878
+ "num_tokens": 2091755.0,
879
+ "step": 930
880
+ },
881
+ {
882
+ "epoch": 1.6702222222222223,
883
+ "grad_norm": 1.5381189584732056,
884
+ "learning_rate": 9.419924337957125e-05,
885
+ "loss": 1.853,
886
+ "mean_token_accuracy": 0.6534279838204384,
887
+ "num_tokens": 2114496.0,
888
+ "step": 940
889
+ },
890
+ {
891
+ "epoch": 1.688,
892
+ "grad_norm": 1.4101791381835938,
893
+ "learning_rate": 9.293820933165196e-05,
894
+ "loss": 1.8696,
895
+ "mean_token_accuracy": 0.6475102975964546,
896
+ "num_tokens": 2137041.0,
897
+ "step": 950
898
+ },
899
+ {
900
+ "epoch": 1.7057777777777776,
901
+ "grad_norm": 1.496955156326294,
902
+ "learning_rate": 9.167717528373266e-05,
903
+ "loss": 1.8752,
904
+ "mean_token_accuracy": 0.6469297721982002,
905
+ "num_tokens": 2159711.0,
906
+ "step": 960
907
+ },
908
+ {
909
+ "epoch": 1.7235555555555555,
910
+ "grad_norm": 1.4269644021987915,
911
+ "learning_rate": 9.041614123581336e-05,
912
+ "loss": 1.8773,
913
+ "mean_token_accuracy": 0.6508476585149765,
914
+ "num_tokens": 2181675.0,
915
+ "step": 970
916
+ },
917
+ {
918
+ "epoch": 1.7413333333333334,
919
+ "grad_norm": 1.4438400268554688,
920
+ "learning_rate": 8.915510718789408e-05,
921
+ "loss": 1.8433,
922
+ "mean_token_accuracy": 0.656335887312889,
923
+ "num_tokens": 2204671.0,
924
+ "step": 980
925
+ },
926
+ {
927
+ "epoch": 1.759111111111111,
928
+ "grad_norm": 1.3846147060394287,
929
+ "learning_rate": 8.789407313997479e-05,
930
+ "loss": 1.8649,
931
+ "mean_token_accuracy": 0.6451441869139671,
932
+ "num_tokens": 2227866.0,
933
+ "step": 990
934
+ },
935
+ {
936
+ "epoch": 1.7768888888888887,
937
+ "grad_norm": 1.5432794094085693,
938
+ "learning_rate": 8.663303909205549e-05,
939
+ "loss": 1.8435,
940
+ "step": 1000
941
+ },
942
+ {
943
+ "epoch": 1.7768888888888887,
944
+ "eval_loss": 1.9385051727294922,
945
+ "eval_mean_token_accuracy": 0.6399562013149261,
946
+ "eval_num_tokens": 2250379.0,
947
+ "eval_runtime": 30.7428,
948
+ "eval_samples_per_second": 32.528,
949
+ "eval_steps_per_second": 8.132,
950
+ "step": 1000
951
+ },
952
+ {
953
+ "epoch": 1.7946666666666666,
954
+ "grad_norm": 1.4225345849990845,
955
+ "learning_rate": 8.537200504413619e-05,
956
+ "loss": 1.8767,
957
+ "mean_token_accuracy": 0.6512193940579891,
958
+ "num_tokens": 2272999.0,
959
+ "step": 1010
960
+ },
961
+ {
962
+ "epoch": 1.8124444444444445,
963
+ "grad_norm": 1.3732675313949585,
964
+ "learning_rate": 8.41109709962169e-05,
965
+ "loss": 1.845,
966
+ "mean_token_accuracy": 0.6550421059131623,
967
+ "num_tokens": 2295110.0,
968
+ "step": 1020
969
+ },
970
+ {
971
+ "epoch": 1.8302222222222222,
972
+ "grad_norm": 1.3867266178131104,
973
+ "learning_rate": 8.284993694829761e-05,
974
+ "loss": 1.8236,
975
+ "mean_token_accuracy": 0.6565809994935989,
976
+ "num_tokens": 2317432.0,
977
+ "step": 1030
978
+ },
979
+ {
980
+ "epoch": 1.8479999999999999,
981
+ "grad_norm": 1.3360997438430786,
982
+ "learning_rate": 8.158890290037832e-05,
983
+ "loss": 1.8642,
984
+ "mean_token_accuracy": 0.6463637053966522,
985
+ "num_tokens": 2340305.0,
986
+ "step": 1040
987
+ },
988
+ {
989
+ "epoch": 1.8657777777777778,
990
+ "grad_norm": 1.4467201232910156,
991
+ "learning_rate": 8.032786885245902e-05,
992
+ "loss": 1.8661,
993
+ "mean_token_accuracy": 0.653101560473442,
994
+ "num_tokens": 2362746.0,
995
+ "step": 1050
996
+ },
997
+ {
998
+ "epoch": 1.8835555555555556,
999
+ "grad_norm": 1.3943202495574951,
1000
+ "learning_rate": 7.906683480453972e-05,
1001
+ "loss": 1.8399,
1002
+ "mean_token_accuracy": 0.6510771587491035,
1003
+ "num_tokens": 2385647.0,
1004
+ "step": 1060
1005
+ },
1006
+ {
1007
+ "epoch": 1.9013333333333333,
1008
+ "grad_norm": 1.4589335918426514,
1009
+ "learning_rate": 7.780580075662043e-05,
1010
+ "loss": 1.8522,
1011
+ "mean_token_accuracy": 0.6478627189993859,
1012
+ "num_tokens": 2408732.0,
1013
+ "step": 1070
1014
+ },
1015
+ {
1016
+ "epoch": 1.919111111111111,
1017
+ "grad_norm": 1.5676307678222656,
1018
+ "learning_rate": 7.654476670870114e-05,
1019
+ "loss": 1.8316,
1020
+ "mean_token_accuracy": 0.656819324195385,
1021
+ "num_tokens": 2431137.0,
1022
+ "step": 1080
1023
+ },
1024
+ {
1025
+ "epoch": 1.9368888888888889,
1026
+ "grad_norm": 1.3882263898849487,
1027
+ "learning_rate": 7.528373266078185e-05,
1028
+ "loss": 1.7965,
1029
+ "mean_token_accuracy": 0.6616187065839767,
1030
+ "num_tokens": 2453517.0,
1031
+ "step": 1090
1032
+ },
1033
+ {
1034
+ "epoch": 1.9546666666666668,
1035
+ "grad_norm": 1.5195387601852417,
1036
+ "learning_rate": 7.402269861286255e-05,
1037
+ "loss": 1.8346,
1038
+ "mean_token_accuracy": 0.6537548035383225,
1039
+ "num_tokens": 2475737.0,
1040
+ "step": 1100
1041
+ },
1042
+ {
1043
+ "epoch": 1.9724444444444444,
1044
+ "grad_norm": 1.3485065698623657,
1045
+ "learning_rate": 7.276166456494325e-05,
1046
+ "loss": 1.835,
1047
+ "mean_token_accuracy": 0.6512499779462815,
1048
+ "num_tokens": 2497880.0,
1049
+ "step": 1110
1050
+ },
1051
+ {
1052
+ "epoch": 1.9902222222222221,
1053
+ "grad_norm": 1.5932726860046387,
1054
+ "learning_rate": 7.150063051702396e-05,
1055
+ "loss": 1.8432,
1056
+ "mean_token_accuracy": 0.6523947417736053,
1057
+ "num_tokens": 2519974.0,
1058
+ "step": 1120
1059
+ },
1060
+ {
1061
+ "epoch": 2.007111111111111,
1062
+ "grad_norm": 1.3682020902633667,
1063
+ "learning_rate": 7.023959646910467e-05,
1064
+ "loss": 1.7383,
1065
+ "mean_token_accuracy": 0.6761166123967421,
1066
+ "num_tokens": 2540296.0,
1067
+ "step": 1130
1068
+ },
1069
+ {
1070
+ "epoch": 2.024888888888889,
1071
+ "grad_norm": 1.4417686462402344,
1072
+ "learning_rate": 6.897856242118538e-05,
1073
+ "loss": 1.697,
1074
+ "mean_token_accuracy": 0.6766574695706368,
1075
+ "num_tokens": 2561919.0,
1076
+ "step": 1140
1077
+ },
1078
+ {
1079
+ "epoch": 2.042666666666667,
1080
+ "grad_norm": 1.375542163848877,
1081
+ "learning_rate": 6.771752837326608e-05,
1082
+ "loss": 1.7361,
1083
+ "mean_token_accuracy": 0.6702861517667771,
1084
+ "num_tokens": 2584915.0,
1085
+ "step": 1150
1086
+ },
1087
+ {
1088
+ "epoch": 2.0604444444444443,
1089
+ "grad_norm": 1.4783133268356323,
1090
+ "learning_rate": 6.645649432534678e-05,
1091
+ "loss": 1.6927,
1092
+ "mean_token_accuracy": 0.6790769457817077,
1093
+ "num_tokens": 2606857.0,
1094
+ "step": 1160
1095
+ },
1096
+ {
1097
+ "epoch": 2.078222222222222,
1098
+ "grad_norm": 1.5346624851226807,
1099
+ "learning_rate": 6.519546027742749e-05,
1100
+ "loss": 1.6938,
1101
+ "mean_token_accuracy": 0.6730351656675339,
1102
+ "num_tokens": 2629737.0,
1103
+ "step": 1170
1104
+ },
1105
+ {
1106
+ "epoch": 2.096,
1107
+ "grad_norm": 1.430298089981079,
1108
+ "learning_rate": 6.39344262295082e-05,
1109
+ "loss": 1.6476,
1110
+ "mean_token_accuracy": 0.6835471093654633,
1111
+ "num_tokens": 2651936.0,
1112
+ "step": 1180
1113
+ },
1114
+ {
1115
+ "epoch": 2.113777777777778,
1116
+ "grad_norm": 1.4968252182006836,
1117
+ "learning_rate": 6.267339218158891e-05,
1118
+ "loss": 1.7242,
1119
+ "mean_token_accuracy": 0.6697919353842735,
1120
+ "num_tokens": 2675241.0,
1121
+ "step": 1190
1122
+ },
1123
+ {
1124
+ "epoch": 2.1315555555555554,
1125
+ "grad_norm": 1.3892192840576172,
1126
+ "learning_rate": 6.141235813366961e-05,
1127
+ "loss": 1.6916,
1128
+ "step": 1200
1129
+ },
1130
+ {
1131
+ "epoch": 2.1315555555555554,
1132
+ "eval_loss": 1.934017539024353,
1133
+ "eval_mean_token_accuracy": 0.6421641361713409,
1134
+ "eval_num_tokens": 2698219.0,
1135
+ "eval_runtime": 30.3979,
1136
+ "eval_samples_per_second": 32.897,
1137
+ "eval_steps_per_second": 8.224,
1138
+ "step": 1200
1139
+ },
1140
+ {
1141
+ "epoch": 2.1493333333333333,
1142
+ "grad_norm": 1.4893920421600342,
1143
+ "learning_rate": 6.0151324085750316e-05,
1144
+ "loss": 1.7047,
1145
+ "mean_token_accuracy": 0.6755686655640603,
1146
+ "num_tokens": 2721580.0,
1147
+ "step": 1210
1148
+ },
1149
+ {
1150
+ "epoch": 2.167111111111111,
1151
+ "grad_norm": 1.50564444065094,
1152
+ "learning_rate": 5.889029003783102e-05,
1153
+ "loss": 1.7058,
1154
+ "mean_token_accuracy": 0.6746152400970459,
1155
+ "num_tokens": 2744110.0,
1156
+ "step": 1220
1157
+ },
1158
+ {
1159
+ "epoch": 2.1848888888888887,
1160
+ "grad_norm": 1.461367130279541,
1161
+ "learning_rate": 5.7629255989911736e-05,
1162
+ "loss": 1.684,
1163
+ "mean_token_accuracy": 0.6813082948327065,
1164
+ "num_tokens": 2765844.0,
1165
+ "step": 1230
1166
+ },
1167
+ {
1168
+ "epoch": 2.2026666666666666,
1169
+ "grad_norm": 1.553553819656372,
1170
+ "learning_rate": 5.636822194199244e-05,
1171
+ "loss": 1.6848,
1172
+ "mean_token_accuracy": 0.677204079926014,
1173
+ "num_tokens": 2788070.0,
1174
+ "step": 1240
1175
+ },
1176
+ {
1177
+ "epoch": 2.2204444444444444,
1178
+ "grad_norm": 1.4453001022338867,
1179
+ "learning_rate": 5.510718789407314e-05,
1180
+ "loss": 1.7182,
1181
+ "mean_token_accuracy": 0.6765570789575577,
1182
+ "num_tokens": 2810964.0,
1183
+ "step": 1250
1184
+ },
1185
+ {
1186
+ "epoch": 2.2382222222222223,
1187
+ "grad_norm": 1.5605733394622803,
1188
+ "learning_rate": 5.384615384615385e-05,
1189
+ "loss": 1.6772,
1190
+ "mean_token_accuracy": 0.678339496254921,
1191
+ "num_tokens": 2833176.0,
1192
+ "step": 1260
1193
+ },
1194
+ {
1195
+ "epoch": 2.2560000000000002,
1196
+ "grad_norm": 1.514710783958435,
1197
+ "learning_rate": 5.258511979823455e-05,
1198
+ "loss": 1.7192,
1199
+ "mean_token_accuracy": 0.6710417225956917,
1200
+ "num_tokens": 2855440.0,
1201
+ "step": 1270
1202
+ },
1203
+ {
1204
+ "epoch": 2.2737777777777777,
1205
+ "grad_norm": 1.510834813117981,
1206
+ "learning_rate": 5.132408575031527e-05,
1207
+ "loss": 1.6599,
1208
+ "mean_token_accuracy": 0.6822926640510559,
1209
+ "num_tokens": 2877713.0,
1210
+ "step": 1280
1211
+ },
1212
+ {
1213
+ "epoch": 2.2915555555555556,
1214
+ "grad_norm": 1.3550519943237305,
1215
+ "learning_rate": 5.006305170239597e-05,
1216
+ "loss": 1.7072,
1217
+ "mean_token_accuracy": 0.6754195600748062,
1218
+ "num_tokens": 2899934.0,
1219
+ "step": 1290
1220
+ },
1221
+ {
1222
+ "epoch": 2.3093333333333335,
1223
+ "grad_norm": 1.5602107048034668,
1224
+ "learning_rate": 4.8802017654476674e-05,
1225
+ "loss": 1.7111,
1226
+ "mean_token_accuracy": 0.6702851369976998,
1227
+ "num_tokens": 2923244.0,
1228
+ "step": 1300
1229
+ },
1230
+ {
1231
+ "epoch": 2.327111111111111,
1232
+ "grad_norm": 1.5889501571655273,
1233
+ "learning_rate": 4.754098360655738e-05,
1234
+ "loss": 1.6858,
1235
+ "mean_token_accuracy": 0.6781487062573432,
1236
+ "num_tokens": 2945580.0,
1237
+ "step": 1310
1238
+ },
1239
+ {
1240
+ "epoch": 2.344888888888889,
1241
+ "grad_norm": 1.4793872833251953,
1242
+ "learning_rate": 4.627994955863809e-05,
1243
+ "loss": 1.6799,
1244
+ "mean_token_accuracy": 0.6740961462259293,
1245
+ "num_tokens": 2969470.0,
1246
+ "step": 1320
1247
+ },
1248
+ {
1249
+ "epoch": 2.3626666666666667,
1250
+ "grad_norm": 1.6188234090805054,
1251
+ "learning_rate": 4.501891551071879e-05,
1252
+ "loss": 1.6838,
1253
+ "mean_token_accuracy": 0.6732241719961166,
1254
+ "num_tokens": 2991982.0,
1255
+ "step": 1330
1256
+ },
1257
+ {
1258
+ "epoch": 2.3804444444444446,
1259
+ "grad_norm": 1.474108338356018,
1260
+ "learning_rate": 4.37578814627995e-05,
1261
+ "loss": 1.7024,
1262
+ "mean_token_accuracy": 0.675683145225048,
1263
+ "num_tokens": 3014206.0,
1264
+ "step": 1340
1265
+ },
1266
+ {
1267
+ "epoch": 2.398222222222222,
1268
+ "grad_norm": 1.4645053148269653,
1269
+ "learning_rate": 4.2496847414880205e-05,
1270
+ "loss": 1.6564,
1271
+ "mean_token_accuracy": 0.6787498995661736,
1272
+ "num_tokens": 3036651.0,
1273
+ "step": 1350
1274
+ },
1275
+ {
1276
+ "epoch": 2.416,
1277
+ "grad_norm": 1.498451828956604,
1278
+ "learning_rate": 4.1235813366960915e-05,
1279
+ "loss": 1.694,
1280
+ "mean_token_accuracy": 0.6752896070480346,
1281
+ "num_tokens": 3058745.0,
1282
+ "step": 1360
1283
+ },
1284
+ {
1285
+ "epoch": 2.433777777777778,
1286
+ "grad_norm": 1.5558826923370361,
1287
+ "learning_rate": 3.997477931904162e-05,
1288
+ "loss": 1.7106,
1289
+ "mean_token_accuracy": 0.6742441862821579,
1290
+ "num_tokens": 3081726.0,
1291
+ "step": 1370
1292
+ },
1293
+ {
1294
+ "epoch": 2.4515555555555557,
1295
+ "grad_norm": 1.5872586965560913,
1296
+ "learning_rate": 3.871374527112232e-05,
1297
+ "loss": 1.6848,
1298
+ "mean_token_accuracy": 0.6764188826084137,
1299
+ "num_tokens": 3104292.0,
1300
+ "step": 1380
1301
+ },
1302
+ {
1303
+ "epoch": 2.469333333333333,
1304
+ "grad_norm": 1.551299810409546,
1305
+ "learning_rate": 3.745271122320303e-05,
1306
+ "loss": 1.6909,
1307
+ "mean_token_accuracy": 0.6762390181422233,
1308
+ "num_tokens": 3126334.0,
1309
+ "step": 1390
1310
+ },
1311
+ {
1312
+ "epoch": 2.487111111111111,
1313
+ "grad_norm": 1.57632315158844,
1314
+ "learning_rate": 3.6191677175283736e-05,
1315
+ "loss": 1.7113,
1316
+ "step": 1400
1317
+ },
1318
+ {
1319
+ "epoch": 2.487111111111111,
1320
+ "eval_loss": 1.922593355178833,
1321
+ "eval_mean_token_accuracy": 0.6445384075641633,
1322
+ "eval_num_tokens": 3149845.0,
1323
+ "eval_runtime": 30.0209,
1324
+ "eval_samples_per_second": 33.31,
1325
+ "eval_steps_per_second": 8.328,
1326
+ "step": 1400
1327
+ },
1328
+ {
1329
+ "epoch": 2.504888888888889,
1330
+ "grad_norm": 1.487930178642273,
1331
+ "learning_rate": 3.4930643127364446e-05,
1332
+ "loss": 1.7087,
1333
+ "mean_token_accuracy": 0.6737867616117,
1334
+ "num_tokens": 3172117.0,
1335
+ "step": 1410
1336
+ },
1337
+ {
1338
+ "epoch": 2.522666666666667,
1339
+ "grad_norm": 1.5210868120193481,
1340
+ "learning_rate": 3.366960907944515e-05,
1341
+ "loss": 1.7009,
1342
+ "mean_token_accuracy": 0.6799842938780785,
1343
+ "num_tokens": 3194261.0,
1344
+ "step": 1420
1345
+ },
1346
+ {
1347
+ "epoch": 2.5404444444444443,
1348
+ "grad_norm": 1.6295726299285889,
1349
+ "learning_rate": 3.240857503152585e-05,
1350
+ "loss": 1.6027,
1351
+ "mean_token_accuracy": 0.6899775773286819,
1352
+ "num_tokens": 3216455.0,
1353
+ "step": 1430
1354
+ },
1355
+ {
1356
+ "epoch": 2.558222222222222,
1357
+ "grad_norm": 1.561673879623413,
1358
+ "learning_rate": 3.114754098360656e-05,
1359
+ "loss": 1.7273,
1360
+ "mean_token_accuracy": 0.6699303150177002,
1361
+ "num_tokens": 3238359.0,
1362
+ "step": 1440
1363
+ },
1364
+ {
1365
+ "epoch": 2.576,
1366
+ "grad_norm": 1.5006392002105713,
1367
+ "learning_rate": 2.9886506935687263e-05,
1368
+ "loss": 1.7243,
1369
+ "mean_token_accuracy": 0.6692202746868133,
1370
+ "num_tokens": 3261568.0,
1371
+ "step": 1450
1372
+ },
1373
+ {
1374
+ "epoch": 2.5937777777777775,
1375
+ "grad_norm": 1.602378249168396,
1376
+ "learning_rate": 2.8625472887767974e-05,
1377
+ "loss": 1.7255,
1378
+ "mean_token_accuracy": 0.6683675542473793,
1379
+ "num_tokens": 3284517.0,
1380
+ "step": 1460
1381
+ },
1382
+ {
1383
+ "epoch": 2.6115555555555554,
1384
+ "grad_norm": 1.6410186290740967,
1385
+ "learning_rate": 2.7364438839848677e-05,
1386
+ "loss": 1.6826,
1387
+ "mean_token_accuracy": 0.6792753636837006,
1388
+ "num_tokens": 3306623.0,
1389
+ "step": 1470
1390
+ },
1391
+ {
1392
+ "epoch": 2.6293333333333333,
1393
+ "grad_norm": 1.4993571043014526,
1394
+ "learning_rate": 2.610340479192938e-05,
1395
+ "loss": 1.6629,
1396
+ "mean_token_accuracy": 0.6814461290836334,
1397
+ "num_tokens": 3329270.0,
1398
+ "step": 1480
1399
+ },
1400
+ {
1401
+ "epoch": 2.647111111111111,
1402
+ "grad_norm": 1.4495617151260376,
1403
+ "learning_rate": 2.484237074401009e-05,
1404
+ "loss": 1.6848,
1405
+ "mean_token_accuracy": 0.6769454509019852,
1406
+ "num_tokens": 3352533.0,
1407
+ "step": 1490
1408
+ },
1409
+ {
1410
+ "epoch": 2.664888888888889,
1411
+ "grad_norm": 1.5677741765975952,
1412
+ "learning_rate": 2.3581336696090794e-05,
1413
+ "loss": 1.6679,
1414
+ "mean_token_accuracy": 0.6842619329690933,
1415
+ "num_tokens": 3374132.0,
1416
+ "step": 1500
1417
+ },
1418
+ {
1419
+ "epoch": 2.6826666666666665,
1420
+ "grad_norm": 1.5430514812469482,
1421
+ "learning_rate": 2.23203026481715e-05,
1422
+ "loss": 1.7122,
1423
+ "mean_token_accuracy": 0.6701778277754784,
1424
+ "num_tokens": 3397176.0,
1425
+ "step": 1510
1426
+ },
1427
+ {
1428
+ "epoch": 2.7004444444444444,
1429
+ "grad_norm": 1.5498685836791992,
1430
+ "learning_rate": 2.1059268600252208e-05,
1431
+ "loss": 1.6631,
1432
+ "mean_token_accuracy": 0.6801098987460137,
1433
+ "num_tokens": 3418961.0,
1434
+ "step": 1520
1435
+ },
1436
+ {
1437
+ "epoch": 2.7182222222222223,
1438
+ "grad_norm": 1.5674185752868652,
1439
+ "learning_rate": 1.9798234552332915e-05,
1440
+ "loss": 1.6677,
1441
+ "mean_token_accuracy": 0.679096283018589,
1442
+ "num_tokens": 3441815.0,
1443
+ "step": 1530
1444
+ },
1445
+ {
1446
+ "epoch": 2.7359999999999998,
1447
+ "grad_norm": 1.4433151483535767,
1448
+ "learning_rate": 1.8537200504413622e-05,
1449
+ "loss": 1.6903,
1450
+ "mean_token_accuracy": 0.6747847631573677,
1451
+ "num_tokens": 3464582.0,
1452
+ "step": 1540
1453
+ },
1454
+ {
1455
+ "epoch": 2.7537777777777777,
1456
+ "grad_norm": 1.4974557161331177,
1457
+ "learning_rate": 1.7276166456494325e-05,
1458
+ "loss": 1.6533,
1459
+ "mean_token_accuracy": 0.683028981089592,
1460
+ "num_tokens": 3486694.0,
1461
+ "step": 1550
1462
+ },
1463
+ {
1464
+ "epoch": 2.7715555555555556,
1465
+ "grad_norm": 1.4934104681015015,
1466
+ "learning_rate": 1.6015132408575032e-05,
1467
+ "loss": 1.699,
1468
+ "mean_token_accuracy": 0.6773065477609634,
1469
+ "num_tokens": 3508908.0,
1470
+ "step": 1560
1471
+ },
1472
+ {
1473
+ "epoch": 2.7893333333333334,
1474
+ "grad_norm": 1.5283113718032837,
1475
+ "learning_rate": 1.4754098360655739e-05,
1476
+ "loss": 1.6329,
1477
+ "mean_token_accuracy": 0.6867676630616188,
1478
+ "num_tokens": 3530306.0,
1479
+ "step": 1570
1480
+ },
1481
+ {
1482
+ "epoch": 2.8071111111111113,
1483
+ "grad_norm": 1.6120567321777344,
1484
+ "learning_rate": 1.3493064312736444e-05,
1485
+ "loss": 1.7164,
1486
+ "mean_token_accuracy": 0.6712423786520958,
1487
+ "num_tokens": 3552880.0,
1488
+ "step": 1580
1489
+ },
1490
+ {
1491
+ "epoch": 2.824888888888889,
1492
+ "grad_norm": 1.5219435691833496,
1493
+ "learning_rate": 1.223203026481715e-05,
1494
+ "loss": 1.7018,
1495
+ "mean_token_accuracy": 0.6758274272084236,
1496
+ "num_tokens": 3575800.0,
1497
+ "step": 1590
1498
+ },
1499
+ {
1500
+ "epoch": 2.8426666666666667,
1501
+ "grad_norm": 1.5157614946365356,
1502
+ "learning_rate": 1.0970996216897856e-05,
1503
+ "loss": 1.6983,
1504
+ "step": 1600
1505
+ },
1506
+ {
1507
+ "epoch": 2.8426666666666667,
1508
+ "eval_loss": 1.9110984802246094,
1509
+ "eval_mean_token_accuracy": 0.6461525177955627,
1510
+ "eval_num_tokens": 3598214.0,
1511
+ "eval_runtime": 30.748,
1512
+ "eval_samples_per_second": 32.522,
1513
+ "eval_steps_per_second": 8.131,
1514
+ "step": 1600
1515
+ }
1516
+ ],
1517
+ "logging_steps": 10,
1518
+ "max_steps": 1686,
1519
+ "num_input_tokens_seen": 0,
1520
+ "num_train_epochs": 3,
1521
+ "save_steps": 200,
1522
+ "stateful_callbacks": {
1523
+ "TrainerControl": {
1524
+ "args": {
1525
+ "should_epoch_stop": false,
1526
+ "should_evaluate": false,
1527
+ "should_log": false,
1528
+ "should_save": true,
1529
+ "should_training_stop": false
1530
+ },
1531
+ "attributes": {}
1532
+ }
1533
+ },
1534
+ "total_flos": 1.2669174249603072e+16,
1535
+ "train_batch_size": 4,
1536
+ "trial_name": null,
1537
+ "trial_params": null
1538
+ }
checkpoint-1600/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bbd2538562d29d0ea8a0dc81d11411522bce0862261591b886509bfea955316
3
+ size 5624
checkpoint-1600/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1686/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
checkpoint-1686/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "o_proj",
28
+ "up_proj",
29
+ "k_proj",
30
+ "v_proj",
31
+ "gate_proj",
32
+ "down_proj",
33
+ "q_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
checkpoint-1686/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4598aa20e3a56f47500be2641a60f352aa5ba55a933842c6bdea199c7a8d6d26
3
+ size 39366152
checkpoint-1686/added_tokens.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<EMAIL>": 110521,
3
+ "<KEY>": 110522,
4
+ "<NAME>": 110520,
5
+ "<PASSWORD>": 110523,
6
+ "<code_to_intermediate>": 110502,
7
+ "<empty_output>": 110501,
8
+ "<file_sep>": 110492,
9
+ "<intermediate_to_code>": 110503,
10
+ "<issue_closed>": 110495,
11
+ "<issue_comment>": 110494,
12
+ "<issue_start>": 110493,
13
+ "<jupyter_code>": 110498,
14
+ "<jupyter_output>": 110499,
15
+ "<jupyter_script>": 110500,
16
+ "<jupyter_start>": 110496,
17
+ "<jupyter_text>": 110497,
18
+ "<pr>": 110504,
19
+ "<pr_base>": 110507,
20
+ "<pr_base_code>": 110509,
21
+ "<pr_comment>": 110512,
22
+ "<pr_diff>": 110510,
23
+ "<pr_diff_hunk>": 110511,
24
+ "<pr_diff_hunk_comment_line>": 110519,
25
+ "<pr_event_id>": 110513,
26
+ "<pr_file>": 110508,
27
+ "<pr_in_reply_to_comment_id>": 110518,
28
+ "<pr_in_reply_to_review_id>": 110517,
29
+ "<pr_is_merged>": 110506,
30
+ "<pr_review>": 110514,
31
+ "<pr_review_comment>": 110516,
32
+ "<pr_review_state>": 110515,
33
+ "<pr_status>": 110505,
34
+ "<repo_name>": 110491
35
+ }
checkpoint-1686/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1686/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:497e0118b25ae604a37936feb52691a8cc5b0863d8952fd21027899e29be9cd3
3
+ size 14244
checkpoint-1686/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63dbab23cc82464a9c8c01ae33214484c9584396241f39b145d622357769f092
3
+ size 988
checkpoint-1686/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:421dd7d3054be47d3867306d484918f216a1080eb3fe3d8d78cca142c00f8cf4
3
+ size 1064
checkpoint-1686/special_tokens_map.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|fim_prefix|>",
5
+ "<|fim_middle|>",
6
+ "<|fim_suffix|>",
7
+ "<|endofprompt|>",
8
+ "<|_unuse_missing_100256|>",
9
+ "<|_unuse_missing_100261|>",
10
+ "<|_unuse_missing_100262|>",
11
+ "<|_unuse_missing_100263|>",
12
+ "<|_unuse_missing_100264|>",
13
+ "<|_unuse_missing_100265|>",
14
+ "<|_unuse_missing_100266|>",
15
+ "<|_unuse_missing_100267|>",
16
+ "<|_unuse_missing_100268|>",
17
+ "<|_unuse_missing_100269|>",
18
+ "<|_unuse_missing_100270|>",
19
+ "<|_unuse_missing_100271|>",
20
+ "<|im_start|>",
21
+ "<|im_end|>",
22
+ "<|stop|>",
23
+ "<|endofturn|>",
24
+ "<repo_name>",
25
+ "<file_sep>",
26
+ "<issue_start>",
27
+ "<issue_comment>",
28
+ "<issue_closed>",
29
+ "<jupyter_start>",
30
+ "<jupyter_text>",
31
+ "<jupyter_code>",
32
+ "<jupyter_output>",
33
+ "<jupyter_script>",
34
+ "<empty_output>",
35
+ "<code_to_intermediate>",
36
+ "<intermediate_to_code>",
37
+ "<pr>",
38
+ "<pr_status>",
39
+ "<pr_is_merged>",
40
+ "<pr_base>",
41
+ "<pr_file>",
42
+ "<pr_base_code>",
43
+ "<pr_diff>",
44
+ "<pr_diff_hunk>",
45
+ "<pr_comment>",
46
+ "<pr_event_id>",
47
+ "<pr_review>",
48
+ "<pr_review_state>",
49
+ "<pr_review_comment>",
50
+ "<pr_in_reply_to_review_id>",
51
+ "<pr_in_reply_to_comment_id>",
52
+ "<pr_diff_hunk_comment_line>",
53
+ "<NAME>",
54
+ "<EMAIL>",
55
+ "<KEY>",
56
+ "<PASSWORD>"
57
+ ],
58
+ "bos_token": {
59
+ "content": "<|endoftext|>",
60
+ "lstrip": false,
61
+ "normalized": false,
62
+ "rstrip": false,
63
+ "single_word": false
64
+ },
65
+ "eos_token": {
66
+ "content": "<|endofturn|>",
67
+ "lstrip": false,
68
+ "normalized": false,
69
+ "rstrip": false,
70
+ "single_word": false
71
+ },
72
+ "pad_token": {
73
+ "content": "<|endoftext|>",
74
+ "lstrip": false,
75
+ "normalized": false,
76
+ "rstrip": false,
77
+ "single_word": false
78
+ },
79
+ "unk_token": {
80
+ "content": "<|endoftext|>",
81
+ "lstrip": false,
82
+ "normalized": false,
83
+ "rstrip": false,
84
+ "single_word": false
85
+ }
86
+ }
checkpoint-1686/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1686/tokenizer_config.json ADDED
@@ -0,0 +1,502 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|_unuse_missing_100256|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "<|_unuse_missing_100261|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "100262": {
53
+ "content": "<|_unuse_missing_100262|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "100263": {
61
+ "content": "<|_unuse_missing_100263|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "100264": {
69
+ "content": "<|_unuse_missing_100264|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|_unuse_missing_100265|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<|_unuse_missing_100266|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "100267": {
93
+ "content": "<|_unuse_missing_100267|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "100268": {
101
+ "content": "<|_unuse_missing_100268|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "100269": {
109
+ "content": "<|_unuse_missing_100269|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "100270": {
117
+ "content": "<|_unuse_missing_100270|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "100271": {
125
+ "content": "<|_unuse_missing_100271|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "100272": {
133
+ "content": "<|im_start|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "100273": {
141
+ "content": "<|im_end|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "100274": {
149
+ "content": "<|stop|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "100275": {
157
+ "content": "<|endofturn|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "110491": {
173
+ "content": "<repo_name>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "110492": {
181
+ "content": "<file_sep>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "110493": {
189
+ "content": "<issue_start>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "110494": {
197
+ "content": "<issue_comment>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "110495": {
205
+ "content": "<issue_closed>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "110496": {
213
+ "content": "<jupyter_start>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "110497": {
221
+ "content": "<jupyter_text>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "110498": {
229
+ "content": "<jupyter_code>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "110499": {
237
+ "content": "<jupyter_output>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "110500": {
245
+ "content": "<jupyter_script>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "110501": {
253
+ "content": "<empty_output>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "110502": {
261
+ "content": "<code_to_intermediate>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "110503": {
269
+ "content": "<intermediate_to_code>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "110504": {
277
+ "content": "<pr>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "110505": {
285
+ "content": "<pr_status>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "110506": {
293
+ "content": "<pr_is_merged>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "110507": {
301
+ "content": "<pr_base>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "110508": {
309
+ "content": "<pr_file>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "110509": {
317
+ "content": "<pr_base_code>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "110510": {
325
+ "content": "<pr_diff>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "110511": {
333
+ "content": "<pr_diff_hunk>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "110512": {
341
+ "content": "<pr_comment>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "110513": {
349
+ "content": "<pr_event_id>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "110514": {
357
+ "content": "<pr_review>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "110515": {
365
+ "content": "<pr_review_state>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "110516": {
373
+ "content": "<pr_review_comment>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "110517": {
381
+ "content": "<pr_in_reply_to_review_id>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "110518": {
389
+ "content": "<pr_in_reply_to_comment_id>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "110519": {
397
+ "content": "<pr_diff_hunk_comment_line>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "110520": {
405
+ "content": "<NAME>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "110521": {
413
+ "content": "<EMAIL>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "110522": {
421
+ "content": "<KEY>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "110523": {
429
+ "content": "<PASSWORD>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ }
436
+ },
437
+ "additional_special_tokens": [
438
+ "<|endoftext|>",
439
+ "<|fim_prefix|>",
440
+ "<|fim_middle|>",
441
+ "<|fim_suffix|>",
442
+ "<|endofprompt|>",
443
+ "<|_unuse_missing_100256|>",
444
+ "<|_unuse_missing_100261|>",
445
+ "<|_unuse_missing_100262|>",
446
+ "<|_unuse_missing_100263|>",
447
+ "<|_unuse_missing_100264|>",
448
+ "<|_unuse_missing_100265|>",
449
+ "<|_unuse_missing_100266|>",
450
+ "<|_unuse_missing_100267|>",
451
+ "<|_unuse_missing_100268|>",
452
+ "<|_unuse_missing_100269|>",
453
+ "<|_unuse_missing_100270|>",
454
+ "<|_unuse_missing_100271|>",
455
+ "<|im_start|>",
456
+ "<|im_end|>",
457
+ "<|stop|>",
458
+ "<|endofturn|>",
459
+ "<repo_name>",
460
+ "<file_sep>",
461
+ "<issue_start>",
462
+ "<issue_comment>",
463
+ "<issue_closed>",
464
+ "<jupyter_start>",
465
+ "<jupyter_text>",
466
+ "<jupyter_code>",
467
+ "<jupyter_output>",
468
+ "<jupyter_script>",
469
+ "<empty_output>",
470
+ "<code_to_intermediate>",
471
+ "<intermediate_to_code>",
472
+ "<pr>",
473
+ "<pr_status>",
474
+ "<pr_is_merged>",
475
+ "<pr_base>",
476
+ "<pr_file>",
477
+ "<pr_base_code>",
478
+ "<pr_diff>",
479
+ "<pr_diff_hunk>",
480
+ "<pr_comment>",
481
+ "<pr_event_id>",
482
+ "<pr_review>",
483
+ "<pr_review_state>",
484
+ "<pr_review_comment>",
485
+ "<pr_in_reply_to_review_id>",
486
+ "<pr_in_reply_to_comment_id>",
487
+ "<pr_diff_hunk_comment_line>",
488
+ "<NAME>",
489
+ "<EMAIL>",
490
+ "<KEY>",
491
+ "<PASSWORD>"
492
+ ],
493
+ "bos_token": "<|endoftext|>",
494
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
495
+ "clean_up_tokenization_spaces": true,
496
+ "eos_token": "<|endofturn|>",
497
+ "extra_special_tokens": {},
498
+ "model_max_length": 1000000000000000019884624838656,
499
+ "pad_token": "<|endoftext|>",
500
+ "tokenizer_class": "GPT2Tokenizer",
501
+ "unk_token": "<|endoftext|>"
502
+ }
checkpoint-1686/trainer_state.json ADDED
@@ -0,0 +1,1610 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1600,
3
+ "best_metric": 1.9110984802246094,
4
+ "best_model_checkpoint": "./hyperclova-deobfuscation-lora/checkpoint-1600",
5
+ "epoch": 2.9955555555555557,
6
+ "eval_steps": 200,
7
+ "global_step": 1686,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.017777777777777778,
14
+ "grad_norm": 3.3687641620635986,
15
+ "learning_rate": 1.8e-05,
16
+ "loss": 4.1361,
17
+ "mean_token_accuracy": 0.3493226237595081,
18
+ "num_tokens": 22106.0,
19
+ "step": 10
20
+ },
21
+ {
22
+ "epoch": 0.035555555555555556,
23
+ "grad_norm": 2.5920090675354004,
24
+ "learning_rate": 3.8e-05,
25
+ "loss": 3.7165,
26
+ "mean_token_accuracy": 0.4088538818061352,
27
+ "num_tokens": 44943.0,
28
+ "step": 20
29
+ },
30
+ {
31
+ "epoch": 0.05333333333333334,
32
+ "grad_norm": 2.5703377723693848,
33
+ "learning_rate": 5.8e-05,
34
+ "loss": 3.3356,
35
+ "mean_token_accuracy": 0.4755532510578632,
36
+ "num_tokens": 67397.0,
37
+ "step": 30
38
+ },
39
+ {
40
+ "epoch": 0.07111111111111111,
41
+ "grad_norm": 1.698912262916565,
42
+ "learning_rate": 7.800000000000001e-05,
43
+ "loss": 2.9874,
44
+ "mean_token_accuracy": 0.508383595943451,
45
+ "num_tokens": 89803.0,
46
+ "step": 40
47
+ },
48
+ {
49
+ "epoch": 0.08888888888888889,
50
+ "grad_norm": 1.4602556228637695,
51
+ "learning_rate": 9.8e-05,
52
+ "loss": 2.7854,
53
+ "mean_token_accuracy": 0.5358646497130394,
54
+ "num_tokens": 112364.0,
55
+ "step": 50
56
+ },
57
+ {
58
+ "epoch": 0.10666666666666667,
59
+ "grad_norm": 1.5916705131530762,
60
+ "learning_rate": 0.000118,
61
+ "loss": 2.6546,
62
+ "mean_token_accuracy": 0.5485944993793964,
63
+ "num_tokens": 134028.0,
64
+ "step": 60
65
+ },
66
+ {
67
+ "epoch": 0.12444444444444444,
68
+ "grad_norm": 1.6815338134765625,
69
+ "learning_rate": 0.000138,
70
+ "loss": 2.606,
71
+ "mean_token_accuracy": 0.5535938143730164,
72
+ "num_tokens": 156703.0,
73
+ "step": 70
74
+ },
75
+ {
76
+ "epoch": 0.14222222222222222,
77
+ "grad_norm": 1.8009140491485596,
78
+ "learning_rate": 0.00015800000000000002,
79
+ "loss": 2.5307,
80
+ "mean_token_accuracy": 0.5640750013291835,
81
+ "num_tokens": 178986.0,
82
+ "step": 80
83
+ },
84
+ {
85
+ "epoch": 0.16,
86
+ "grad_norm": 1.4582855701446533,
87
+ "learning_rate": 0.00017800000000000002,
88
+ "loss": 2.5633,
89
+ "mean_token_accuracy": 0.5567230455577373,
90
+ "num_tokens": 201989.0,
91
+ "step": 90
92
+ },
93
+ {
94
+ "epoch": 0.17777777777777778,
95
+ "grad_norm": 1.663874626159668,
96
+ "learning_rate": 0.00019800000000000002,
97
+ "loss": 2.4672,
98
+ "mean_token_accuracy": 0.5688358306884765,
99
+ "num_tokens": 223936.0,
100
+ "step": 100
101
+ },
102
+ {
103
+ "epoch": 0.19555555555555557,
104
+ "grad_norm": 1.6701704263687134,
105
+ "learning_rate": 0.00019886506935687262,
106
+ "loss": 2.4388,
107
+ "mean_token_accuracy": 0.5760447531938553,
108
+ "num_tokens": 246101.0,
109
+ "step": 110
110
+ },
111
+ {
112
+ "epoch": 0.21333333333333335,
113
+ "grad_norm": 1.5731302499771118,
114
+ "learning_rate": 0.00019760403530895334,
115
+ "loss": 2.4377,
116
+ "mean_token_accuracy": 0.5711787067353725,
117
+ "num_tokens": 269187.0,
118
+ "step": 120
119
+ },
120
+ {
121
+ "epoch": 0.2311111111111111,
122
+ "grad_norm": 1.4479353427886963,
123
+ "learning_rate": 0.00019634300126103406,
124
+ "loss": 2.3596,
125
+ "mean_token_accuracy": 0.5830569051206111,
126
+ "num_tokens": 291454.0,
127
+ "step": 130
128
+ },
129
+ {
130
+ "epoch": 0.24888888888888888,
131
+ "grad_norm": 1.3653457164764404,
132
+ "learning_rate": 0.00019508196721311475,
133
+ "loss": 2.3648,
134
+ "mean_token_accuracy": 0.5807973451912403,
135
+ "num_tokens": 314204.0,
136
+ "step": 140
137
+ },
138
+ {
139
+ "epoch": 0.26666666666666666,
140
+ "grad_norm": 1.4210327863693237,
141
+ "learning_rate": 0.00019382093316519546,
142
+ "loss": 2.3186,
143
+ "mean_token_accuracy": 0.5878118917346,
144
+ "num_tokens": 337167.0,
145
+ "step": 150
146
+ },
147
+ {
148
+ "epoch": 0.28444444444444444,
149
+ "grad_norm": 1.532408356666565,
150
+ "learning_rate": 0.00019255989911727615,
151
+ "loss": 2.3637,
152
+ "mean_token_accuracy": 0.5761628717184066,
153
+ "num_tokens": 360272.0,
154
+ "step": 160
155
+ },
156
+ {
157
+ "epoch": 0.3022222222222222,
158
+ "grad_norm": 1.4010679721832275,
159
+ "learning_rate": 0.00019129886506935687,
160
+ "loss": 2.2701,
161
+ "mean_token_accuracy": 0.598077318072319,
162
+ "num_tokens": 382779.0,
163
+ "step": 170
164
+ },
165
+ {
166
+ "epoch": 0.32,
167
+ "grad_norm": 1.5830323696136475,
168
+ "learning_rate": 0.0001900378310214376,
169
+ "loss": 2.2861,
170
+ "mean_token_accuracy": 0.5928302705287933,
171
+ "num_tokens": 405438.0,
172
+ "step": 180
173
+ },
174
+ {
175
+ "epoch": 0.3377777777777778,
176
+ "grad_norm": 1.4623483419418335,
177
+ "learning_rate": 0.00018877679697351828,
178
+ "loss": 2.3192,
179
+ "mean_token_accuracy": 0.5854370579123497,
180
+ "num_tokens": 428660.0,
181
+ "step": 190
182
+ },
183
+ {
184
+ "epoch": 0.35555555555555557,
185
+ "grad_norm": 1.4850527048110962,
186
+ "learning_rate": 0.000187515762925599,
187
+ "loss": 2.256,
188
+ "step": 200
189
+ },
190
+ {
191
+ "epoch": 0.35555555555555557,
192
+ "eval_loss": 2.254753351211548,
193
+ "eval_mean_token_accuracy": 0.5965457199811935,
194
+ "eval_num_tokens": 450808.0,
195
+ "eval_runtime": 30.9386,
196
+ "eval_samples_per_second": 32.322,
197
+ "eval_steps_per_second": 8.081,
198
+ "step": 200
199
+ },
200
+ {
201
+ "epoch": 0.37333333333333335,
202
+ "grad_norm": 1.4195237159729004,
203
+ "learning_rate": 0.00018625472887767968,
204
+ "loss": 2.2607,
205
+ "mean_token_accuracy": 0.594056948274374,
206
+ "num_tokens": 473434.0,
207
+ "step": 210
208
+ },
209
+ {
210
+ "epoch": 0.39111111111111113,
211
+ "grad_norm": 1.3114796876907349,
212
+ "learning_rate": 0.0001849936948297604,
213
+ "loss": 2.2947,
214
+ "mean_token_accuracy": 0.5898103177547455,
215
+ "num_tokens": 496482.0,
216
+ "step": 220
217
+ },
218
+ {
219
+ "epoch": 0.4088888888888889,
220
+ "grad_norm": 1.4004285335540771,
221
+ "learning_rate": 0.00018373266078184112,
222
+ "loss": 2.2542,
223
+ "mean_token_accuracy": 0.5970372915267944,
224
+ "num_tokens": 519379.0,
225
+ "step": 230
226
+ },
227
+ {
228
+ "epoch": 0.4266666666666667,
229
+ "grad_norm": 1.3860116004943848,
230
+ "learning_rate": 0.0001824716267339218,
231
+ "loss": 2.2636,
232
+ "mean_token_accuracy": 0.59425338357687,
233
+ "num_tokens": 542631.0,
234
+ "step": 240
235
+ },
236
+ {
237
+ "epoch": 0.4444444444444444,
238
+ "grad_norm": 1.3675146102905273,
239
+ "learning_rate": 0.00018121059268600253,
240
+ "loss": 2.2412,
241
+ "mean_token_accuracy": 0.5928545072674751,
242
+ "num_tokens": 565400.0,
243
+ "step": 250
244
+ },
245
+ {
246
+ "epoch": 0.4622222222222222,
247
+ "grad_norm": 1.4246889352798462,
248
+ "learning_rate": 0.00017994955863808322,
249
+ "loss": 2.1577,
250
+ "mean_token_accuracy": 0.6061514511704444,
251
+ "num_tokens": 588003.0,
252
+ "step": 260
253
+ },
254
+ {
255
+ "epoch": 0.48,
256
+ "grad_norm": 1.4046531915664673,
257
+ "learning_rate": 0.00017868852459016393,
258
+ "loss": 2.1862,
259
+ "mean_token_accuracy": 0.6008762732148171,
260
+ "num_tokens": 610974.0,
261
+ "step": 270
262
+ },
263
+ {
264
+ "epoch": 0.49777777777777776,
265
+ "grad_norm": 1.4038338661193848,
266
+ "learning_rate": 0.00017742749054224465,
267
+ "loss": 2.2219,
268
+ "mean_token_accuracy": 0.5970636487007142,
269
+ "num_tokens": 634093.0,
270
+ "step": 280
271
+ },
272
+ {
273
+ "epoch": 0.5155555555555555,
274
+ "grad_norm": 1.3291988372802734,
275
+ "learning_rate": 0.00017616645649432534,
276
+ "loss": 2.131,
277
+ "mean_token_accuracy": 0.6172704175114632,
278
+ "num_tokens": 656188.0,
279
+ "step": 290
280
+ },
281
+ {
282
+ "epoch": 0.5333333333333333,
283
+ "grad_norm": 1.444318413734436,
284
+ "learning_rate": 0.00017490542244640606,
285
+ "loss": 2.1691,
286
+ "mean_token_accuracy": 0.6066021353006363,
287
+ "num_tokens": 678769.0,
288
+ "step": 300
289
+ },
290
+ {
291
+ "epoch": 0.5511111111111111,
292
+ "grad_norm": 1.3459752798080444,
293
+ "learning_rate": 0.00017364438839848675,
294
+ "loss": 2.1413,
295
+ "mean_token_accuracy": 0.6139265760779381,
296
+ "num_tokens": 701734.0,
297
+ "step": 310
298
+ },
299
+ {
300
+ "epoch": 0.5688888888888889,
301
+ "grad_norm": 1.3597490787506104,
302
+ "learning_rate": 0.00017238335435056746,
303
+ "loss": 2.1271,
304
+ "mean_token_accuracy": 0.6106095433235168,
305
+ "num_tokens": 724815.0,
306
+ "step": 320
307
+ },
308
+ {
309
+ "epoch": 0.5866666666666667,
310
+ "grad_norm": 1.4757016897201538,
311
+ "learning_rate": 0.00017112232030264818,
312
+ "loss": 2.133,
313
+ "mean_token_accuracy": 0.6147415205836296,
314
+ "num_tokens": 746903.0,
315
+ "step": 330
316
+ },
317
+ {
318
+ "epoch": 0.6044444444444445,
319
+ "grad_norm": 1.4856476783752441,
320
+ "learning_rate": 0.00016986128625472887,
321
+ "loss": 2.1201,
322
+ "mean_token_accuracy": 0.6161383926868439,
323
+ "num_tokens": 768982.0,
324
+ "step": 340
325
+ },
326
+ {
327
+ "epoch": 0.6222222222222222,
328
+ "grad_norm": 1.2596303224563599,
329
+ "learning_rate": 0.0001686002522068096,
330
+ "loss": 2.1392,
331
+ "mean_token_accuracy": 0.6150005847215653,
332
+ "num_tokens": 791061.0,
333
+ "step": 350
334
+ },
335
+ {
336
+ "epoch": 0.64,
337
+ "grad_norm": 1.3324636220932007,
338
+ "learning_rate": 0.00016733921815889028,
339
+ "loss": 2.1201,
340
+ "mean_token_accuracy": 0.6171063780784607,
341
+ "num_tokens": 813112.0,
342
+ "step": 360
343
+ },
344
+ {
345
+ "epoch": 0.6577777777777778,
346
+ "grad_norm": 1.419053316116333,
347
+ "learning_rate": 0.000166078184110971,
348
+ "loss": 2.1237,
349
+ "mean_token_accuracy": 0.6111394688487053,
350
+ "num_tokens": 835469.0,
351
+ "step": 370
352
+ },
353
+ {
354
+ "epoch": 0.6755555555555556,
355
+ "grad_norm": 1.4507274627685547,
356
+ "learning_rate": 0.0001648171500630517,
357
+ "loss": 2.1387,
358
+ "mean_token_accuracy": 0.604290933907032,
359
+ "num_tokens": 857795.0,
360
+ "step": 380
361
+ },
362
+ {
363
+ "epoch": 0.6933333333333334,
364
+ "grad_norm": 1.284505844116211,
365
+ "learning_rate": 0.0001635561160151324,
366
+ "loss": 2.1,
367
+ "mean_token_accuracy": 0.6181465938687325,
368
+ "num_tokens": 879659.0,
369
+ "step": 390
370
+ },
371
+ {
372
+ "epoch": 0.7111111111111111,
373
+ "grad_norm": 1.5179046392440796,
374
+ "learning_rate": 0.00016229508196721312,
375
+ "loss": 2.0813,
376
+ "step": 400
377
+ },
378
+ {
379
+ "epoch": 0.7111111111111111,
380
+ "eval_loss": 2.0953471660614014,
381
+ "eval_mean_token_accuracy": 0.618859866142273,
382
+ "eval_num_tokens": 902240.0,
383
+ "eval_runtime": 30.5153,
384
+ "eval_samples_per_second": 32.77,
385
+ "eval_steps_per_second": 8.193,
386
+ "step": 400
387
+ },
388
+ {
389
+ "epoch": 0.7288888888888889,
390
+ "grad_norm": 1.3377336263656616,
391
+ "learning_rate": 0.0001610340479192938,
392
+ "loss": 2.1049,
393
+ "mean_token_accuracy": 0.6189975582063199,
394
+ "num_tokens": 925091.0,
395
+ "step": 410
396
+ },
397
+ {
398
+ "epoch": 0.7466666666666667,
399
+ "grad_norm": 1.406614065170288,
400
+ "learning_rate": 0.00015977301387137452,
401
+ "loss": 2.1343,
402
+ "mean_token_accuracy": 0.6101128354668617,
403
+ "num_tokens": 948151.0,
404
+ "step": 420
405
+ },
406
+ {
407
+ "epoch": 0.7644444444444445,
408
+ "grad_norm": 1.3494964838027954,
409
+ "learning_rate": 0.00015851197982345524,
410
+ "loss": 2.0506,
411
+ "mean_token_accuracy": 0.6257941454648972,
412
+ "num_tokens": 970339.0,
413
+ "step": 430
414
+ },
415
+ {
416
+ "epoch": 0.7822222222222223,
417
+ "grad_norm": 1.3070355653762817,
418
+ "learning_rate": 0.00015725094577553593,
419
+ "loss": 2.0955,
420
+ "mean_token_accuracy": 0.6162661850452423,
421
+ "num_tokens": 993552.0,
422
+ "step": 440
423
+ },
424
+ {
425
+ "epoch": 0.8,
426
+ "grad_norm": 1.3954617977142334,
427
+ "learning_rate": 0.00015598991172761665,
428
+ "loss": 2.1119,
429
+ "mean_token_accuracy": 0.6154530435800553,
430
+ "num_tokens": 1015564.0,
431
+ "step": 450
432
+ },
433
+ {
434
+ "epoch": 0.8177777777777778,
435
+ "grad_norm": 1.4015129804611206,
436
+ "learning_rate": 0.00015472887767969734,
437
+ "loss": 2.0153,
438
+ "mean_token_accuracy": 0.6296211943030358,
439
+ "num_tokens": 1037721.0,
440
+ "step": 460
441
+ },
442
+ {
443
+ "epoch": 0.8355555555555556,
444
+ "grad_norm": 1.41290283203125,
445
+ "learning_rate": 0.00015346784363177806,
446
+ "loss": 2.0914,
447
+ "mean_token_accuracy": 0.6156619966030121,
448
+ "num_tokens": 1060627.0,
449
+ "step": 470
450
+ },
451
+ {
452
+ "epoch": 0.8533333333333334,
453
+ "grad_norm": 1.3715571165084839,
454
+ "learning_rate": 0.00015220680958385877,
455
+ "loss": 2.0674,
456
+ "mean_token_accuracy": 0.6202241629362106,
457
+ "num_tokens": 1082672.0,
458
+ "step": 480
459
+ },
460
+ {
461
+ "epoch": 0.8711111111111111,
462
+ "grad_norm": 1.3797943592071533,
463
+ "learning_rate": 0.00015094577553593946,
464
+ "loss": 2.0677,
465
+ "mean_token_accuracy": 0.6200241416692733,
466
+ "num_tokens": 1104857.0,
467
+ "step": 490
468
+ },
469
+ {
470
+ "epoch": 0.8888888888888888,
471
+ "grad_norm": 1.3080323934555054,
472
+ "learning_rate": 0.00014968474148802018,
473
+ "loss": 2.068,
474
+ "mean_token_accuracy": 0.618759186565876,
475
+ "num_tokens": 1127612.0,
476
+ "step": 500
477
+ },
478
+ {
479
+ "epoch": 0.9066666666666666,
480
+ "grad_norm": 1.4698944091796875,
481
+ "learning_rate": 0.0001484237074401009,
482
+ "loss": 2.0736,
483
+ "mean_token_accuracy": 0.6208444744348526,
484
+ "num_tokens": 1150411.0,
485
+ "step": 510
486
+ },
487
+ {
488
+ "epoch": 0.9244444444444444,
489
+ "grad_norm": 1.3741239309310913,
490
+ "learning_rate": 0.0001471626733921816,
491
+ "loss": 2.0887,
492
+ "mean_token_accuracy": 0.6161769673228263,
493
+ "num_tokens": 1172683.0,
494
+ "step": 520
495
+ },
496
+ {
497
+ "epoch": 0.9422222222222222,
498
+ "grad_norm": 1.3237783908843994,
499
+ "learning_rate": 0.0001459016393442623,
500
+ "loss": 1.9793,
501
+ "mean_token_accuracy": 0.6360917523503303,
502
+ "num_tokens": 1194160.0,
503
+ "step": 530
504
+ },
505
+ {
506
+ "epoch": 0.96,
507
+ "grad_norm": 1.3243825435638428,
508
+ "learning_rate": 0.000144640605296343,
509
+ "loss": 2.0095,
510
+ "mean_token_accuracy": 0.6338530048727989,
511
+ "num_tokens": 1215760.0,
512
+ "step": 540
513
+ },
514
+ {
515
+ "epoch": 0.9777777777777777,
516
+ "grad_norm": 1.3875395059585571,
517
+ "learning_rate": 0.0001433795712484237,
518
+ "loss": 2.0715,
519
+ "mean_token_accuracy": 0.6245882242918015,
520
+ "num_tokens": 1238191.0,
521
+ "step": 550
522
+ },
523
+ {
524
+ "epoch": 0.9955555555555555,
525
+ "grad_norm": 1.390081524848938,
526
+ "learning_rate": 0.00014211853720050443,
527
+ "loss": 2.0421,
528
+ "mean_token_accuracy": 0.6229756608605385,
529
+ "num_tokens": 1260429.0,
530
+ "step": 560
531
+ },
532
+ {
533
+ "epoch": 1.0124444444444445,
534
+ "grad_norm": 1.2626862525939941,
535
+ "learning_rate": 0.00014085750315258512,
536
+ "loss": 1.9614,
537
+ "mean_token_accuracy": 0.6359066555374547,
538
+ "num_tokens": 1281232.0,
539
+ "step": 570
540
+ },
541
+ {
542
+ "epoch": 1.0302222222222222,
543
+ "grad_norm": 1.3941477537155151,
544
+ "learning_rate": 0.00013959646910466583,
545
+ "loss": 1.8782,
546
+ "mean_token_accuracy": 0.6482988312840462,
547
+ "num_tokens": 1304130.0,
548
+ "step": 580
549
+ },
550
+ {
551
+ "epoch": 1.048,
552
+ "grad_norm": 1.4020227193832397,
553
+ "learning_rate": 0.00013833543505674652,
554
+ "loss": 1.8602,
555
+ "mean_token_accuracy": 0.65641980022192,
556
+ "num_tokens": 1326753.0,
557
+ "step": 590
558
+ },
559
+ {
560
+ "epoch": 1.0657777777777777,
561
+ "grad_norm": 1.285709023475647,
562
+ "learning_rate": 0.00013707440100882724,
563
+ "loss": 1.8661,
564
+ "step": 600
565
+ },
566
+ {
567
+ "epoch": 1.0657777777777777,
568
+ "eval_loss": 2.018383264541626,
569
+ "eval_mean_token_accuracy": 0.6295498251914978,
570
+ "eval_num_tokens": 1348985.0,
571
+ "eval_runtime": 30.5245,
572
+ "eval_samples_per_second": 32.761,
573
+ "eval_steps_per_second": 8.19,
574
+ "step": 600
575
+ },
576
+ {
577
+ "epoch": 1.0835555555555556,
578
+ "grad_norm": 1.2745097875595093,
579
+ "learning_rate": 0.00013581336696090796,
580
+ "loss": 1.8705,
581
+ "mean_token_accuracy": 0.650456714630127,
582
+ "num_tokens": 1371318.0,
583
+ "step": 610
584
+ },
585
+ {
586
+ "epoch": 1.1013333333333333,
587
+ "grad_norm": 1.3518744707107544,
588
+ "learning_rate": 0.00013455233291298865,
589
+ "loss": 1.9056,
590
+ "mean_token_accuracy": 0.6455502569675445,
591
+ "num_tokens": 1393816.0,
592
+ "step": 620
593
+ },
594
+ {
595
+ "epoch": 1.1191111111111112,
596
+ "grad_norm": 1.4413272142410278,
597
+ "learning_rate": 0.00013329129886506937,
598
+ "loss": 1.8994,
599
+ "mean_token_accuracy": 0.6459770023822784,
600
+ "num_tokens": 1416529.0,
601
+ "step": 630
602
+ },
603
+ {
604
+ "epoch": 1.1368888888888888,
605
+ "grad_norm": 1.3811439275741577,
606
+ "learning_rate": 0.00013203026481715006,
607
+ "loss": 1.9138,
608
+ "mean_token_accuracy": 0.6459063500165939,
609
+ "num_tokens": 1438970.0,
610
+ "step": 640
611
+ },
612
+ {
613
+ "epoch": 1.1546666666666667,
614
+ "grad_norm": 1.3642174005508423,
615
+ "learning_rate": 0.00013076923076923077,
616
+ "loss": 1.8892,
617
+ "mean_token_accuracy": 0.6444340243935585,
618
+ "num_tokens": 1461324.0,
619
+ "step": 650
620
+ },
621
+ {
622
+ "epoch": 1.1724444444444444,
623
+ "grad_norm": 1.4544634819030762,
624
+ "learning_rate": 0.0001295081967213115,
625
+ "loss": 1.9248,
626
+ "mean_token_accuracy": 0.6391839399933815,
627
+ "num_tokens": 1484246.0,
628
+ "step": 660
629
+ },
630
+ {
631
+ "epoch": 1.1902222222222223,
632
+ "grad_norm": 1.3715091943740845,
633
+ "learning_rate": 0.00012824716267339218,
634
+ "loss": 1.9105,
635
+ "mean_token_accuracy": 0.6393024668097496,
636
+ "num_tokens": 1507305.0,
637
+ "step": 670
638
+ },
639
+ {
640
+ "epoch": 1.208,
641
+ "grad_norm": 1.3897929191589355,
642
+ "learning_rate": 0.0001269861286254729,
643
+ "loss": 1.8714,
644
+ "mean_token_accuracy": 0.6521286174654961,
645
+ "num_tokens": 1529082.0,
646
+ "step": 680
647
+ },
648
+ {
649
+ "epoch": 1.2257777777777779,
650
+ "grad_norm": 1.3576809167861938,
651
+ "learning_rate": 0.00012572509457755359,
652
+ "loss": 1.8677,
653
+ "mean_token_accuracy": 0.6498221024870873,
654
+ "num_tokens": 1551159.0,
655
+ "step": 690
656
+ },
657
+ {
658
+ "epoch": 1.2435555555555555,
659
+ "grad_norm": 1.3156862258911133,
660
+ "learning_rate": 0.0001244640605296343,
661
+ "loss": 1.8996,
662
+ "mean_token_accuracy": 0.6485181763768196,
663
+ "num_tokens": 1573348.0,
664
+ "step": 700
665
+ },
666
+ {
667
+ "epoch": 1.2613333333333334,
668
+ "grad_norm": 1.4738845825195312,
669
+ "learning_rate": 0.00012320302648171502,
670
+ "loss": 1.8953,
671
+ "mean_token_accuracy": 0.6465991452336312,
672
+ "num_tokens": 1595989.0,
673
+ "step": 710
674
+ },
675
+ {
676
+ "epoch": 1.279111111111111,
677
+ "grad_norm": 1.5254158973693848,
678
+ "learning_rate": 0.00012194199243379571,
679
+ "loss": 1.9236,
680
+ "mean_token_accuracy": 0.6474427729845047,
681
+ "num_tokens": 1617895.0,
682
+ "step": 720
683
+ },
684
+ {
685
+ "epoch": 1.2968888888888888,
686
+ "grad_norm": 1.4867346286773682,
687
+ "learning_rate": 0.00012068095838587643,
688
+ "loss": 1.8766,
689
+ "mean_token_accuracy": 0.6491386488080024,
690
+ "num_tokens": 1640415.0,
691
+ "step": 730
692
+ },
693
+ {
694
+ "epoch": 1.3146666666666667,
695
+ "grad_norm": 1.3776379823684692,
696
+ "learning_rate": 0.00011941992433795712,
697
+ "loss": 1.8644,
698
+ "mean_token_accuracy": 0.6499749347567558,
699
+ "num_tokens": 1662713.0,
700
+ "step": 740
701
+ },
702
+ {
703
+ "epoch": 1.3324444444444445,
704
+ "grad_norm": 1.420027256011963,
705
+ "learning_rate": 0.00011815889029003783,
706
+ "loss": 1.8874,
707
+ "mean_token_accuracy": 0.648992708325386,
708
+ "num_tokens": 1684783.0,
709
+ "step": 750
710
+ },
711
+ {
712
+ "epoch": 1.3502222222222222,
713
+ "grad_norm": 1.356441855430603,
714
+ "learning_rate": 0.00011689785624211855,
715
+ "loss": 1.8937,
716
+ "mean_token_accuracy": 0.6503370434045792,
717
+ "num_tokens": 1706623.0,
718
+ "step": 760
719
+ },
720
+ {
721
+ "epoch": 1.3679999999999999,
722
+ "grad_norm": 1.4901665449142456,
723
+ "learning_rate": 0.00011563682219419924,
724
+ "loss": 1.9094,
725
+ "mean_token_accuracy": 0.6417872324585915,
726
+ "num_tokens": 1729494.0,
727
+ "step": 770
728
+ },
729
+ {
730
+ "epoch": 1.3857777777777778,
731
+ "grad_norm": 1.3679572343826294,
732
+ "learning_rate": 0.00011437578814627996,
733
+ "loss": 1.8841,
734
+ "mean_token_accuracy": 0.6478032737970352,
735
+ "num_tokens": 1752045.0,
736
+ "step": 780
737
+ },
738
+ {
739
+ "epoch": 1.4035555555555557,
740
+ "grad_norm": 1.3518086671829224,
741
+ "learning_rate": 0.00011311475409836065,
742
+ "loss": 1.9021,
743
+ "mean_token_accuracy": 0.6460829824209213,
744
+ "num_tokens": 1775601.0,
745
+ "step": 790
746
+ },
747
+ {
748
+ "epoch": 1.4213333333333333,
749
+ "grad_norm": 1.400870442390442,
750
+ "learning_rate": 0.00011185372005044137,
751
+ "loss": 1.8566,
752
+ "step": 800
753
+ },
754
+ {
755
+ "epoch": 1.4213333333333333,
756
+ "eval_loss": 1.9762645959854126,
757
+ "eval_mean_token_accuracy": 0.6358254022598266,
758
+ "eval_num_tokens": 1798633.0,
759
+ "eval_runtime": 30.7115,
760
+ "eval_samples_per_second": 32.561,
761
+ "eval_steps_per_second": 8.14,
762
+ "step": 800
763
+ },
764
+ {
765
+ "epoch": 1.439111111111111,
766
+ "grad_norm": 1.4487619400024414,
767
+ "learning_rate": 0.00011059268600252208,
768
+ "loss": 1.8417,
769
+ "mean_token_accuracy": 0.6496566243469715,
770
+ "num_tokens": 1820656.0,
771
+ "step": 810
772
+ },
773
+ {
774
+ "epoch": 1.456888888888889,
775
+ "grad_norm": 1.4507944583892822,
776
+ "learning_rate": 0.00010933165195460277,
777
+ "loss": 1.8829,
778
+ "mean_token_accuracy": 0.647446171939373,
779
+ "num_tokens": 1842871.0,
780
+ "step": 820
781
+ },
782
+ {
783
+ "epoch": 1.4746666666666668,
784
+ "grad_norm": 1.3563170433044434,
785
+ "learning_rate": 0.00010807061790668349,
786
+ "loss": 1.8508,
787
+ "mean_token_accuracy": 0.6544376760721207,
788
+ "num_tokens": 1865652.0,
789
+ "step": 830
790
+ },
791
+ {
792
+ "epoch": 1.4924444444444445,
793
+ "grad_norm": 1.366861343383789,
794
+ "learning_rate": 0.00010680958385876418,
795
+ "loss": 1.8756,
796
+ "mean_token_accuracy": 0.648781743645668,
797
+ "num_tokens": 1888455.0,
798
+ "step": 840
799
+ },
800
+ {
801
+ "epoch": 1.5102222222222221,
802
+ "grad_norm": 1.5031019449234009,
803
+ "learning_rate": 0.0001055485498108449,
804
+ "loss": 1.8461,
805
+ "mean_token_accuracy": 0.6583632439374923,
806
+ "num_tokens": 1910676.0,
807
+ "step": 850
808
+ },
809
+ {
810
+ "epoch": 1.528,
811
+ "grad_norm": 1.5248113870620728,
812
+ "learning_rate": 0.00010428751576292561,
813
+ "loss": 1.8857,
814
+ "mean_token_accuracy": 0.6470584884285927,
815
+ "num_tokens": 1933253.0,
816
+ "step": 860
817
+ },
818
+ {
819
+ "epoch": 1.545777777777778,
820
+ "grad_norm": 1.4354236125946045,
821
+ "learning_rate": 0.0001030264817150063,
822
+ "loss": 1.892,
823
+ "mean_token_accuracy": 0.6445729210972786,
824
+ "num_tokens": 1955820.0,
825
+ "step": 870
826
+ },
827
+ {
828
+ "epoch": 1.5635555555555556,
829
+ "grad_norm": 1.4288746118545532,
830
+ "learning_rate": 0.00010176544766708702,
831
+ "loss": 1.878,
832
+ "mean_token_accuracy": 0.6476826578378677,
833
+ "num_tokens": 1978120.0,
834
+ "step": 880
835
+ },
836
+ {
837
+ "epoch": 1.5813333333333333,
838
+ "grad_norm": 1.433902382850647,
839
+ "learning_rate": 0.00010050441361916771,
840
+ "loss": 1.8199,
841
+ "mean_token_accuracy": 0.6561270505189896,
842
+ "num_tokens": 2000508.0,
843
+ "step": 890
844
+ },
845
+ {
846
+ "epoch": 1.5991111111111111,
847
+ "grad_norm": 1.332987904548645,
848
+ "learning_rate": 9.924337957124843e-05,
849
+ "loss": 1.8555,
850
+ "mean_token_accuracy": 0.6499203637242317,
851
+ "num_tokens": 2023356.0,
852
+ "step": 900
853
+ },
854
+ {
855
+ "epoch": 1.616888888888889,
856
+ "grad_norm": 1.3830794095993042,
857
+ "learning_rate": 9.798234552332913e-05,
858
+ "loss": 1.8108,
859
+ "mean_token_accuracy": 0.6618377715349197,
860
+ "num_tokens": 2045965.0,
861
+ "step": 910
862
+ },
863
+ {
864
+ "epoch": 1.6346666666666667,
865
+ "grad_norm": 1.3988080024719238,
866
+ "learning_rate": 9.672131147540983e-05,
867
+ "loss": 1.8791,
868
+ "mean_token_accuracy": 0.6456288158893585,
869
+ "num_tokens": 2069108.0,
870
+ "step": 920
871
+ },
872
+ {
873
+ "epoch": 1.6524444444444444,
874
+ "grad_norm": 1.398549199104309,
875
+ "learning_rate": 9.546027742749055e-05,
876
+ "loss": 1.8885,
877
+ "mean_token_accuracy": 0.6464410901069642,
878
+ "num_tokens": 2091755.0,
879
+ "step": 930
880
+ },
881
+ {
882
+ "epoch": 1.6702222222222223,
883
+ "grad_norm": 1.5381189584732056,
884
+ "learning_rate": 9.419924337957125e-05,
885
+ "loss": 1.853,
886
+ "mean_token_accuracy": 0.6534279838204384,
887
+ "num_tokens": 2114496.0,
888
+ "step": 940
889
+ },
890
+ {
891
+ "epoch": 1.688,
892
+ "grad_norm": 1.4101791381835938,
893
+ "learning_rate": 9.293820933165196e-05,
894
+ "loss": 1.8696,
895
+ "mean_token_accuracy": 0.6475102975964546,
896
+ "num_tokens": 2137041.0,
897
+ "step": 950
898
+ },
899
+ {
900
+ "epoch": 1.7057777777777776,
901
+ "grad_norm": 1.496955156326294,
902
+ "learning_rate": 9.167717528373266e-05,
903
+ "loss": 1.8752,
904
+ "mean_token_accuracy": 0.6469297721982002,
905
+ "num_tokens": 2159711.0,
906
+ "step": 960
907
+ },
908
+ {
909
+ "epoch": 1.7235555555555555,
910
+ "grad_norm": 1.4269644021987915,
911
+ "learning_rate": 9.041614123581336e-05,
912
+ "loss": 1.8773,
913
+ "mean_token_accuracy": 0.6508476585149765,
914
+ "num_tokens": 2181675.0,
915
+ "step": 970
916
+ },
917
+ {
918
+ "epoch": 1.7413333333333334,
919
+ "grad_norm": 1.4438400268554688,
920
+ "learning_rate": 8.915510718789408e-05,
921
+ "loss": 1.8433,
922
+ "mean_token_accuracy": 0.656335887312889,
923
+ "num_tokens": 2204671.0,
924
+ "step": 980
925
+ },
926
+ {
927
+ "epoch": 1.759111111111111,
928
+ "grad_norm": 1.3846147060394287,
929
+ "learning_rate": 8.789407313997479e-05,
930
+ "loss": 1.8649,
931
+ "mean_token_accuracy": 0.6451441869139671,
932
+ "num_tokens": 2227866.0,
933
+ "step": 990
934
+ },
935
+ {
936
+ "epoch": 1.7768888888888887,
937
+ "grad_norm": 1.5432794094085693,
938
+ "learning_rate": 8.663303909205549e-05,
939
+ "loss": 1.8435,
940
+ "step": 1000
941
+ },
942
+ {
943
+ "epoch": 1.7768888888888887,
944
+ "eval_loss": 1.9385051727294922,
945
+ "eval_mean_token_accuracy": 0.6399562013149261,
946
+ "eval_num_tokens": 2250379.0,
947
+ "eval_runtime": 30.7428,
948
+ "eval_samples_per_second": 32.528,
949
+ "eval_steps_per_second": 8.132,
950
+ "step": 1000
951
+ },
952
+ {
953
+ "epoch": 1.7946666666666666,
954
+ "grad_norm": 1.4225345849990845,
955
+ "learning_rate": 8.537200504413619e-05,
956
+ "loss": 1.8767,
957
+ "mean_token_accuracy": 0.6512193940579891,
958
+ "num_tokens": 2272999.0,
959
+ "step": 1010
960
+ },
961
+ {
962
+ "epoch": 1.8124444444444445,
963
+ "grad_norm": 1.3732675313949585,
964
+ "learning_rate": 8.41109709962169e-05,
965
+ "loss": 1.845,
966
+ "mean_token_accuracy": 0.6550421059131623,
967
+ "num_tokens": 2295110.0,
968
+ "step": 1020
969
+ },
970
+ {
971
+ "epoch": 1.8302222222222222,
972
+ "grad_norm": 1.3867266178131104,
973
+ "learning_rate": 8.284993694829761e-05,
974
+ "loss": 1.8236,
975
+ "mean_token_accuracy": 0.6565809994935989,
976
+ "num_tokens": 2317432.0,
977
+ "step": 1030
978
+ },
979
+ {
980
+ "epoch": 1.8479999999999999,
981
+ "grad_norm": 1.3360997438430786,
982
+ "learning_rate": 8.158890290037832e-05,
983
+ "loss": 1.8642,
984
+ "mean_token_accuracy": 0.6463637053966522,
985
+ "num_tokens": 2340305.0,
986
+ "step": 1040
987
+ },
988
+ {
989
+ "epoch": 1.8657777777777778,
990
+ "grad_norm": 1.4467201232910156,
991
+ "learning_rate": 8.032786885245902e-05,
992
+ "loss": 1.8661,
993
+ "mean_token_accuracy": 0.653101560473442,
994
+ "num_tokens": 2362746.0,
995
+ "step": 1050
996
+ },
997
+ {
998
+ "epoch": 1.8835555555555556,
999
+ "grad_norm": 1.3943202495574951,
1000
+ "learning_rate": 7.906683480453972e-05,
1001
+ "loss": 1.8399,
1002
+ "mean_token_accuracy": 0.6510771587491035,
1003
+ "num_tokens": 2385647.0,
1004
+ "step": 1060
1005
+ },
1006
+ {
1007
+ "epoch": 1.9013333333333333,
1008
+ "grad_norm": 1.4589335918426514,
1009
+ "learning_rate": 7.780580075662043e-05,
1010
+ "loss": 1.8522,
1011
+ "mean_token_accuracy": 0.6478627189993859,
1012
+ "num_tokens": 2408732.0,
1013
+ "step": 1070
1014
+ },
1015
+ {
1016
+ "epoch": 1.919111111111111,
1017
+ "grad_norm": 1.5676307678222656,
1018
+ "learning_rate": 7.654476670870114e-05,
1019
+ "loss": 1.8316,
1020
+ "mean_token_accuracy": 0.656819324195385,
1021
+ "num_tokens": 2431137.0,
1022
+ "step": 1080
1023
+ },
1024
+ {
1025
+ "epoch": 1.9368888888888889,
1026
+ "grad_norm": 1.3882263898849487,
1027
+ "learning_rate": 7.528373266078185e-05,
1028
+ "loss": 1.7965,
1029
+ "mean_token_accuracy": 0.6616187065839767,
1030
+ "num_tokens": 2453517.0,
1031
+ "step": 1090
1032
+ },
1033
+ {
1034
+ "epoch": 1.9546666666666668,
1035
+ "grad_norm": 1.5195387601852417,
1036
+ "learning_rate": 7.402269861286255e-05,
1037
+ "loss": 1.8346,
1038
+ "mean_token_accuracy": 0.6537548035383225,
1039
+ "num_tokens": 2475737.0,
1040
+ "step": 1100
1041
+ },
1042
+ {
1043
+ "epoch": 1.9724444444444444,
1044
+ "grad_norm": 1.3485065698623657,
1045
+ "learning_rate": 7.276166456494325e-05,
1046
+ "loss": 1.835,
1047
+ "mean_token_accuracy": 0.6512499779462815,
1048
+ "num_tokens": 2497880.0,
1049
+ "step": 1110
1050
+ },
1051
+ {
1052
+ "epoch": 1.9902222222222221,
1053
+ "grad_norm": 1.5932726860046387,
1054
+ "learning_rate": 7.150063051702396e-05,
1055
+ "loss": 1.8432,
1056
+ "mean_token_accuracy": 0.6523947417736053,
1057
+ "num_tokens": 2519974.0,
1058
+ "step": 1120
1059
+ },
1060
+ {
1061
+ "epoch": 2.007111111111111,
1062
+ "grad_norm": 1.3682020902633667,
1063
+ "learning_rate": 7.023959646910467e-05,
1064
+ "loss": 1.7383,
1065
+ "mean_token_accuracy": 0.6761166123967421,
1066
+ "num_tokens": 2540296.0,
1067
+ "step": 1130
1068
+ },
1069
+ {
1070
+ "epoch": 2.024888888888889,
1071
+ "grad_norm": 1.4417686462402344,
1072
+ "learning_rate": 6.897856242118538e-05,
1073
+ "loss": 1.697,
1074
+ "mean_token_accuracy": 0.6766574695706368,
1075
+ "num_tokens": 2561919.0,
1076
+ "step": 1140
1077
+ },
1078
+ {
1079
+ "epoch": 2.042666666666667,
1080
+ "grad_norm": 1.375542163848877,
1081
+ "learning_rate": 6.771752837326608e-05,
1082
+ "loss": 1.7361,
1083
+ "mean_token_accuracy": 0.6702861517667771,
1084
+ "num_tokens": 2584915.0,
1085
+ "step": 1150
1086
+ },
1087
+ {
1088
+ "epoch": 2.0604444444444443,
1089
+ "grad_norm": 1.4783133268356323,
1090
+ "learning_rate": 6.645649432534678e-05,
1091
+ "loss": 1.6927,
1092
+ "mean_token_accuracy": 0.6790769457817077,
1093
+ "num_tokens": 2606857.0,
1094
+ "step": 1160
1095
+ },
1096
+ {
1097
+ "epoch": 2.078222222222222,
1098
+ "grad_norm": 1.5346624851226807,
1099
+ "learning_rate": 6.519546027742749e-05,
1100
+ "loss": 1.6938,
1101
+ "mean_token_accuracy": 0.6730351656675339,
1102
+ "num_tokens": 2629737.0,
1103
+ "step": 1170
1104
+ },
1105
+ {
1106
+ "epoch": 2.096,
1107
+ "grad_norm": 1.430298089981079,
1108
+ "learning_rate": 6.39344262295082e-05,
1109
+ "loss": 1.6476,
1110
+ "mean_token_accuracy": 0.6835471093654633,
1111
+ "num_tokens": 2651936.0,
1112
+ "step": 1180
1113
+ },
1114
+ {
1115
+ "epoch": 2.113777777777778,
1116
+ "grad_norm": 1.4968252182006836,
1117
+ "learning_rate": 6.267339218158891e-05,
1118
+ "loss": 1.7242,
1119
+ "mean_token_accuracy": 0.6697919353842735,
1120
+ "num_tokens": 2675241.0,
1121
+ "step": 1190
1122
+ },
1123
+ {
1124
+ "epoch": 2.1315555555555554,
1125
+ "grad_norm": 1.3892192840576172,
1126
+ "learning_rate": 6.141235813366961e-05,
1127
+ "loss": 1.6916,
1128
+ "step": 1200
1129
+ },
1130
+ {
1131
+ "epoch": 2.1315555555555554,
1132
+ "eval_loss": 1.934017539024353,
1133
+ "eval_mean_token_accuracy": 0.6421641361713409,
1134
+ "eval_num_tokens": 2698219.0,
1135
+ "eval_runtime": 30.3979,
1136
+ "eval_samples_per_second": 32.897,
1137
+ "eval_steps_per_second": 8.224,
1138
+ "step": 1200
1139
+ },
1140
+ {
1141
+ "epoch": 2.1493333333333333,
1142
+ "grad_norm": 1.4893920421600342,
1143
+ "learning_rate": 6.0151324085750316e-05,
1144
+ "loss": 1.7047,
1145
+ "mean_token_accuracy": 0.6755686655640603,
1146
+ "num_tokens": 2721580.0,
1147
+ "step": 1210
1148
+ },
1149
+ {
1150
+ "epoch": 2.167111111111111,
1151
+ "grad_norm": 1.50564444065094,
1152
+ "learning_rate": 5.889029003783102e-05,
1153
+ "loss": 1.7058,
1154
+ "mean_token_accuracy": 0.6746152400970459,
1155
+ "num_tokens": 2744110.0,
1156
+ "step": 1220
1157
+ },
1158
+ {
1159
+ "epoch": 2.1848888888888887,
1160
+ "grad_norm": 1.461367130279541,
1161
+ "learning_rate": 5.7629255989911736e-05,
1162
+ "loss": 1.684,
1163
+ "mean_token_accuracy": 0.6813082948327065,
1164
+ "num_tokens": 2765844.0,
1165
+ "step": 1230
1166
+ },
1167
+ {
1168
+ "epoch": 2.2026666666666666,
1169
+ "grad_norm": 1.553553819656372,
1170
+ "learning_rate": 5.636822194199244e-05,
1171
+ "loss": 1.6848,
1172
+ "mean_token_accuracy": 0.677204079926014,
1173
+ "num_tokens": 2788070.0,
1174
+ "step": 1240
1175
+ },
1176
+ {
1177
+ "epoch": 2.2204444444444444,
1178
+ "grad_norm": 1.4453001022338867,
1179
+ "learning_rate": 5.510718789407314e-05,
1180
+ "loss": 1.7182,
1181
+ "mean_token_accuracy": 0.6765570789575577,
1182
+ "num_tokens": 2810964.0,
1183
+ "step": 1250
1184
+ },
1185
+ {
1186
+ "epoch": 2.2382222222222223,
1187
+ "grad_norm": 1.5605733394622803,
1188
+ "learning_rate": 5.384615384615385e-05,
1189
+ "loss": 1.6772,
1190
+ "mean_token_accuracy": 0.678339496254921,
1191
+ "num_tokens": 2833176.0,
1192
+ "step": 1260
1193
+ },
1194
+ {
1195
+ "epoch": 2.2560000000000002,
1196
+ "grad_norm": 1.514710783958435,
1197
+ "learning_rate": 5.258511979823455e-05,
1198
+ "loss": 1.7192,
1199
+ "mean_token_accuracy": 0.6710417225956917,
1200
+ "num_tokens": 2855440.0,
1201
+ "step": 1270
1202
+ },
1203
+ {
1204
+ "epoch": 2.2737777777777777,
1205
+ "grad_norm": 1.510834813117981,
1206
+ "learning_rate": 5.132408575031527e-05,
1207
+ "loss": 1.6599,
1208
+ "mean_token_accuracy": 0.6822926640510559,
1209
+ "num_tokens": 2877713.0,
1210
+ "step": 1280
1211
+ },
1212
+ {
1213
+ "epoch": 2.2915555555555556,
1214
+ "grad_norm": 1.3550519943237305,
1215
+ "learning_rate": 5.006305170239597e-05,
1216
+ "loss": 1.7072,
1217
+ "mean_token_accuracy": 0.6754195600748062,
1218
+ "num_tokens": 2899934.0,
1219
+ "step": 1290
1220
+ },
1221
+ {
1222
+ "epoch": 2.3093333333333335,
1223
+ "grad_norm": 1.5602107048034668,
1224
+ "learning_rate": 4.8802017654476674e-05,
1225
+ "loss": 1.7111,
1226
+ "mean_token_accuracy": 0.6702851369976998,
1227
+ "num_tokens": 2923244.0,
1228
+ "step": 1300
1229
+ },
1230
+ {
1231
+ "epoch": 2.327111111111111,
1232
+ "grad_norm": 1.5889501571655273,
1233
+ "learning_rate": 4.754098360655738e-05,
1234
+ "loss": 1.6858,
1235
+ "mean_token_accuracy": 0.6781487062573432,
1236
+ "num_tokens": 2945580.0,
1237
+ "step": 1310
1238
+ },
1239
+ {
1240
+ "epoch": 2.344888888888889,
1241
+ "grad_norm": 1.4793872833251953,
1242
+ "learning_rate": 4.627994955863809e-05,
1243
+ "loss": 1.6799,
1244
+ "mean_token_accuracy": 0.6740961462259293,
1245
+ "num_tokens": 2969470.0,
1246
+ "step": 1320
1247
+ },
1248
+ {
1249
+ "epoch": 2.3626666666666667,
1250
+ "grad_norm": 1.6188234090805054,
1251
+ "learning_rate": 4.501891551071879e-05,
1252
+ "loss": 1.6838,
1253
+ "mean_token_accuracy": 0.6732241719961166,
1254
+ "num_tokens": 2991982.0,
1255
+ "step": 1330
1256
+ },
1257
+ {
1258
+ "epoch": 2.3804444444444446,
1259
+ "grad_norm": 1.474108338356018,
1260
+ "learning_rate": 4.37578814627995e-05,
1261
+ "loss": 1.7024,
1262
+ "mean_token_accuracy": 0.675683145225048,
1263
+ "num_tokens": 3014206.0,
1264
+ "step": 1340
1265
+ },
1266
+ {
1267
+ "epoch": 2.398222222222222,
1268
+ "grad_norm": 1.4645053148269653,
1269
+ "learning_rate": 4.2496847414880205e-05,
1270
+ "loss": 1.6564,
1271
+ "mean_token_accuracy": 0.6787498995661736,
1272
+ "num_tokens": 3036651.0,
1273
+ "step": 1350
1274
+ },
1275
+ {
1276
+ "epoch": 2.416,
1277
+ "grad_norm": 1.498451828956604,
1278
+ "learning_rate": 4.1235813366960915e-05,
1279
+ "loss": 1.694,
1280
+ "mean_token_accuracy": 0.6752896070480346,
1281
+ "num_tokens": 3058745.0,
1282
+ "step": 1360
1283
+ },
1284
+ {
1285
+ "epoch": 2.433777777777778,
1286
+ "grad_norm": 1.5558826923370361,
1287
+ "learning_rate": 3.997477931904162e-05,
1288
+ "loss": 1.7106,
1289
+ "mean_token_accuracy": 0.6742441862821579,
1290
+ "num_tokens": 3081726.0,
1291
+ "step": 1370
1292
+ },
1293
+ {
1294
+ "epoch": 2.4515555555555557,
1295
+ "grad_norm": 1.5872586965560913,
1296
+ "learning_rate": 3.871374527112232e-05,
1297
+ "loss": 1.6848,
1298
+ "mean_token_accuracy": 0.6764188826084137,
1299
+ "num_tokens": 3104292.0,
1300
+ "step": 1380
1301
+ },
1302
+ {
1303
+ "epoch": 2.469333333333333,
1304
+ "grad_norm": 1.551299810409546,
1305
+ "learning_rate": 3.745271122320303e-05,
1306
+ "loss": 1.6909,
1307
+ "mean_token_accuracy": 0.6762390181422233,
1308
+ "num_tokens": 3126334.0,
1309
+ "step": 1390
1310
+ },
1311
+ {
1312
+ "epoch": 2.487111111111111,
1313
+ "grad_norm": 1.57632315158844,
1314
+ "learning_rate": 3.6191677175283736e-05,
1315
+ "loss": 1.7113,
1316
+ "step": 1400
1317
+ },
1318
+ {
1319
+ "epoch": 2.487111111111111,
1320
+ "eval_loss": 1.922593355178833,
1321
+ "eval_mean_token_accuracy": 0.6445384075641633,
1322
+ "eval_num_tokens": 3149845.0,
1323
+ "eval_runtime": 30.0209,
1324
+ "eval_samples_per_second": 33.31,
1325
+ "eval_steps_per_second": 8.328,
1326
+ "step": 1400
1327
+ },
1328
+ {
1329
+ "epoch": 2.504888888888889,
1330
+ "grad_norm": 1.487930178642273,
1331
+ "learning_rate": 3.4930643127364446e-05,
1332
+ "loss": 1.7087,
1333
+ "mean_token_accuracy": 0.6737867616117,
1334
+ "num_tokens": 3172117.0,
1335
+ "step": 1410
1336
+ },
1337
+ {
1338
+ "epoch": 2.522666666666667,
1339
+ "grad_norm": 1.5210868120193481,
1340
+ "learning_rate": 3.366960907944515e-05,
1341
+ "loss": 1.7009,
1342
+ "mean_token_accuracy": 0.6799842938780785,
1343
+ "num_tokens": 3194261.0,
1344
+ "step": 1420
1345
+ },
1346
+ {
1347
+ "epoch": 2.5404444444444443,
1348
+ "grad_norm": 1.6295726299285889,
1349
+ "learning_rate": 3.240857503152585e-05,
1350
+ "loss": 1.6027,
1351
+ "mean_token_accuracy": 0.6899775773286819,
1352
+ "num_tokens": 3216455.0,
1353
+ "step": 1430
1354
+ },
1355
+ {
1356
+ "epoch": 2.558222222222222,
1357
+ "grad_norm": 1.561673879623413,
1358
+ "learning_rate": 3.114754098360656e-05,
1359
+ "loss": 1.7273,
1360
+ "mean_token_accuracy": 0.6699303150177002,
1361
+ "num_tokens": 3238359.0,
1362
+ "step": 1440
1363
+ },
1364
+ {
1365
+ "epoch": 2.576,
1366
+ "grad_norm": 1.5006392002105713,
1367
+ "learning_rate": 2.9886506935687263e-05,
1368
+ "loss": 1.7243,
1369
+ "mean_token_accuracy": 0.6692202746868133,
1370
+ "num_tokens": 3261568.0,
1371
+ "step": 1450
1372
+ },
1373
+ {
1374
+ "epoch": 2.5937777777777775,
1375
+ "grad_norm": 1.602378249168396,
1376
+ "learning_rate": 2.8625472887767974e-05,
1377
+ "loss": 1.7255,
1378
+ "mean_token_accuracy": 0.6683675542473793,
1379
+ "num_tokens": 3284517.0,
1380
+ "step": 1460
1381
+ },
1382
+ {
1383
+ "epoch": 2.6115555555555554,
1384
+ "grad_norm": 1.6410186290740967,
1385
+ "learning_rate": 2.7364438839848677e-05,
1386
+ "loss": 1.6826,
1387
+ "mean_token_accuracy": 0.6792753636837006,
1388
+ "num_tokens": 3306623.0,
1389
+ "step": 1470
1390
+ },
1391
+ {
1392
+ "epoch": 2.6293333333333333,
1393
+ "grad_norm": 1.4993571043014526,
1394
+ "learning_rate": 2.610340479192938e-05,
1395
+ "loss": 1.6629,
1396
+ "mean_token_accuracy": 0.6814461290836334,
1397
+ "num_tokens": 3329270.0,
1398
+ "step": 1480
1399
+ },
1400
+ {
1401
+ "epoch": 2.647111111111111,
1402
+ "grad_norm": 1.4495617151260376,
1403
+ "learning_rate": 2.484237074401009e-05,
1404
+ "loss": 1.6848,
1405
+ "mean_token_accuracy": 0.6769454509019852,
1406
+ "num_tokens": 3352533.0,
1407
+ "step": 1490
1408
+ },
1409
+ {
1410
+ "epoch": 2.664888888888889,
1411
+ "grad_norm": 1.5677741765975952,
1412
+ "learning_rate": 2.3581336696090794e-05,
1413
+ "loss": 1.6679,
1414
+ "mean_token_accuracy": 0.6842619329690933,
1415
+ "num_tokens": 3374132.0,
1416
+ "step": 1500
1417
+ },
1418
+ {
1419
+ "epoch": 2.6826666666666665,
1420
+ "grad_norm": 1.5430514812469482,
1421
+ "learning_rate": 2.23203026481715e-05,
1422
+ "loss": 1.7122,
1423
+ "mean_token_accuracy": 0.6701778277754784,
1424
+ "num_tokens": 3397176.0,
1425
+ "step": 1510
1426
+ },
1427
+ {
1428
+ "epoch": 2.7004444444444444,
1429
+ "grad_norm": 1.5498685836791992,
1430
+ "learning_rate": 2.1059268600252208e-05,
1431
+ "loss": 1.6631,
1432
+ "mean_token_accuracy": 0.6801098987460137,
1433
+ "num_tokens": 3418961.0,
1434
+ "step": 1520
1435
+ },
1436
+ {
1437
+ "epoch": 2.7182222222222223,
1438
+ "grad_norm": 1.5674185752868652,
1439
+ "learning_rate": 1.9798234552332915e-05,
1440
+ "loss": 1.6677,
1441
+ "mean_token_accuracy": 0.679096283018589,
1442
+ "num_tokens": 3441815.0,
1443
+ "step": 1530
1444
+ },
1445
+ {
1446
+ "epoch": 2.7359999999999998,
1447
+ "grad_norm": 1.4433151483535767,
1448
+ "learning_rate": 1.8537200504413622e-05,
1449
+ "loss": 1.6903,
1450
+ "mean_token_accuracy": 0.6747847631573677,
1451
+ "num_tokens": 3464582.0,
1452
+ "step": 1540
1453
+ },
1454
+ {
1455
+ "epoch": 2.7537777777777777,
1456
+ "grad_norm": 1.4974557161331177,
1457
+ "learning_rate": 1.7276166456494325e-05,
1458
+ "loss": 1.6533,
1459
+ "mean_token_accuracy": 0.683028981089592,
1460
+ "num_tokens": 3486694.0,
1461
+ "step": 1550
1462
+ },
1463
+ {
1464
+ "epoch": 2.7715555555555556,
1465
+ "grad_norm": 1.4934104681015015,
1466
+ "learning_rate": 1.6015132408575032e-05,
1467
+ "loss": 1.699,
1468
+ "mean_token_accuracy": 0.6773065477609634,
1469
+ "num_tokens": 3508908.0,
1470
+ "step": 1560
1471
+ },
1472
+ {
1473
+ "epoch": 2.7893333333333334,
1474
+ "grad_norm": 1.5283113718032837,
1475
+ "learning_rate": 1.4754098360655739e-05,
1476
+ "loss": 1.6329,
1477
+ "mean_token_accuracy": 0.6867676630616188,
1478
+ "num_tokens": 3530306.0,
1479
+ "step": 1570
1480
+ },
1481
+ {
1482
+ "epoch": 2.8071111111111113,
1483
+ "grad_norm": 1.6120567321777344,
1484
+ "learning_rate": 1.3493064312736444e-05,
1485
+ "loss": 1.7164,
1486
+ "mean_token_accuracy": 0.6712423786520958,
1487
+ "num_tokens": 3552880.0,
1488
+ "step": 1580
1489
+ },
1490
+ {
1491
+ "epoch": 2.824888888888889,
1492
+ "grad_norm": 1.5219435691833496,
1493
+ "learning_rate": 1.223203026481715e-05,
1494
+ "loss": 1.7018,
1495
+ "mean_token_accuracy": 0.6758274272084236,
1496
+ "num_tokens": 3575800.0,
1497
+ "step": 1590
1498
+ },
1499
+ {
1500
+ "epoch": 2.8426666666666667,
1501
+ "grad_norm": 1.5157614946365356,
1502
+ "learning_rate": 1.0970996216897856e-05,
1503
+ "loss": 1.6983,
1504
+ "step": 1600
1505
+ },
1506
+ {
1507
+ "epoch": 2.8426666666666667,
1508
+ "eval_loss": 1.9110984802246094,
1509
+ "eval_mean_token_accuracy": 0.6461525177955627,
1510
+ "eval_num_tokens": 3598214.0,
1511
+ "eval_runtime": 30.748,
1512
+ "eval_samples_per_second": 32.522,
1513
+ "eval_steps_per_second": 8.131,
1514
+ "step": 1600
1515
+ },
1516
+ {
1517
+ "epoch": 2.8604444444444446,
1518
+ "grad_norm": 1.5881015062332153,
1519
+ "learning_rate": 9.709962168978563e-06,
1520
+ "loss": 1.679,
1521
+ "mean_token_accuracy": 0.6803914837539196,
1522
+ "num_tokens": 3620353.0,
1523
+ "step": 1610
1524
+ },
1525
+ {
1526
+ "epoch": 2.878222222222222,
1527
+ "grad_norm": 1.5353198051452637,
1528
+ "learning_rate": 8.448928121059268e-06,
1529
+ "loss": 1.681,
1530
+ "mean_token_accuracy": 0.6778546258807182,
1531
+ "num_tokens": 3642281.0,
1532
+ "step": 1620
1533
+ },
1534
+ {
1535
+ "epoch": 2.896,
1536
+ "grad_norm": 1.5839284658432007,
1537
+ "learning_rate": 7.187894073139975e-06,
1538
+ "loss": 1.6397,
1539
+ "mean_token_accuracy": 0.6827977553009987,
1540
+ "num_tokens": 3664853.0,
1541
+ "step": 1630
1542
+ },
1543
+ {
1544
+ "epoch": 2.913777777777778,
1545
+ "grad_norm": 1.6904590129852295,
1546
+ "learning_rate": 5.926860025220681e-06,
1547
+ "loss": 1.7405,
1548
+ "mean_token_accuracy": 0.6691433653235436,
1549
+ "num_tokens": 3688123.0,
1550
+ "step": 1640
1551
+ },
1552
+ {
1553
+ "epoch": 2.9315555555555557,
1554
+ "grad_norm": 1.5607562065124512,
1555
+ "learning_rate": 4.665825977301387e-06,
1556
+ "loss": 1.7285,
1557
+ "mean_token_accuracy": 0.6671519264578819,
1558
+ "num_tokens": 3710934.0,
1559
+ "step": 1650
1560
+ },
1561
+ {
1562
+ "epoch": 2.9493333333333336,
1563
+ "grad_norm": 1.613752841949463,
1564
+ "learning_rate": 3.404791929382094e-06,
1565
+ "loss": 1.7251,
1566
+ "mean_token_accuracy": 0.676729716360569,
1567
+ "num_tokens": 3733298.0,
1568
+ "step": 1660
1569
+ },
1570
+ {
1571
+ "epoch": 2.967111111111111,
1572
+ "grad_norm": 1.4459993839263916,
1573
+ "learning_rate": 2.1437578814628e-06,
1574
+ "loss": 1.729,
1575
+ "mean_token_accuracy": 0.6713886946439743,
1576
+ "num_tokens": 3755641.0,
1577
+ "step": 1670
1578
+ },
1579
+ {
1580
+ "epoch": 2.984888888888889,
1581
+ "grad_norm": 1.6163533926010132,
1582
+ "learning_rate": 8.827238335435058e-07,
1583
+ "loss": 1.6633,
1584
+ "mean_token_accuracy": 0.6792998388409615,
1585
+ "num_tokens": 3777938.0,
1586
+ "step": 1680
1587
+ }
1588
+ ],
1589
+ "logging_steps": 10,
1590
+ "max_steps": 1686,
1591
+ "num_input_tokens_seen": 0,
1592
+ "num_train_epochs": 3,
1593
+ "save_steps": 200,
1594
+ "stateful_callbacks": {
1595
+ "TrainerControl": {
1596
+ "args": {
1597
+ "should_epoch_stop": false,
1598
+ "should_evaluate": false,
1599
+ "should_log": false,
1600
+ "should_save": true,
1601
+ "should_training_stop": true
1602
+ },
1603
+ "attributes": {}
1604
+ }
1605
+ },
1606
+ "total_flos": 1.3348842111787008e+16,
1607
+ "train_batch_size": 4,
1608
+ "trial_name": null,
1609
+ "trial_params": null
1610
+ }
checkpoint-1686/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bbd2538562d29d0ea8a0dc81d11411522bce0862261591b886509bfea955316
3
+ size 5624
checkpoint-1686/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|fim_prefix|>",
5
+ "<|fim_middle|>",
6
+ "<|fim_suffix|>",
7
+ "<|endofprompt|>",
8
+ "<|_unuse_missing_100256|>",
9
+ "<|_unuse_missing_100261|>",
10
+ "<|_unuse_missing_100262|>",
11
+ "<|_unuse_missing_100263|>",
12
+ "<|_unuse_missing_100264|>",
13
+ "<|_unuse_missing_100265|>",
14
+ "<|_unuse_missing_100266|>",
15
+ "<|_unuse_missing_100267|>",
16
+ "<|_unuse_missing_100268|>",
17
+ "<|_unuse_missing_100269|>",
18
+ "<|_unuse_missing_100270|>",
19
+ "<|_unuse_missing_100271|>",
20
+ "<|im_start|>",
21
+ "<|im_end|>",
22
+ "<|stop|>",
23
+ "<|endofturn|>",
24
+ "<repo_name>",
25
+ "<file_sep>",
26
+ "<issue_start>",
27
+ "<issue_comment>",
28
+ "<issue_closed>",
29
+ "<jupyter_start>",
30
+ "<jupyter_text>",
31
+ "<jupyter_code>",
32
+ "<jupyter_output>",
33
+ "<jupyter_script>",
34
+ "<empty_output>",
35
+ "<code_to_intermediate>",
36
+ "<intermediate_to_code>",
37
+ "<pr>",
38
+ "<pr_status>",
39
+ "<pr_is_merged>",
40
+ "<pr_base>",
41
+ "<pr_file>",
42
+ "<pr_base_code>",
43
+ "<pr_diff>",
44
+ "<pr_diff_hunk>",
45
+ "<pr_comment>",
46
+ "<pr_event_id>",
47
+ "<pr_review>",
48
+ "<pr_review_state>",
49
+ "<pr_review_comment>",
50
+ "<pr_in_reply_to_review_id>",
51
+ "<pr_in_reply_to_comment_id>",
52
+ "<pr_diff_hunk_comment_line>",
53
+ "<NAME>",
54
+ "<EMAIL>",
55
+ "<KEY>",
56
+ "<PASSWORD>"
57
+ ],
58
+ "bos_token": {
59
+ "content": "<|endoftext|>",
60
+ "lstrip": false,
61
+ "normalized": false,
62
+ "rstrip": false,
63
+ "single_word": false
64
+ },
65
+ "eos_token": {
66
+ "content": "<|endofturn|>",
67
+ "lstrip": false,
68
+ "normalized": false,
69
+ "rstrip": false,
70
+ "single_word": false
71
+ },
72
+ "pad_token": {
73
+ "content": "<|endoftext|>",
74
+ "lstrip": false,
75
+ "normalized": false,
76
+ "rstrip": false,
77
+ "single_word": false
78
+ },
79
+ "unk_token": {
80
+ "content": "<|endoftext|>",
81
+ "lstrip": false,
82
+ "normalized": false,
83
+ "rstrip": false,
84
+ "single_word": false
85
+ }
86
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,502 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|_unuse_missing_100256|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "<|_unuse_missing_100261|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "100262": {
53
+ "content": "<|_unuse_missing_100262|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "100263": {
61
+ "content": "<|_unuse_missing_100263|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "100264": {
69
+ "content": "<|_unuse_missing_100264|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|_unuse_missing_100265|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<|_unuse_missing_100266|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "100267": {
93
+ "content": "<|_unuse_missing_100267|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "100268": {
101
+ "content": "<|_unuse_missing_100268|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "100269": {
109
+ "content": "<|_unuse_missing_100269|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "100270": {
117
+ "content": "<|_unuse_missing_100270|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "100271": {
125
+ "content": "<|_unuse_missing_100271|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "100272": {
133
+ "content": "<|im_start|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "100273": {
141
+ "content": "<|im_end|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "100274": {
149
+ "content": "<|stop|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "100275": {
157
+ "content": "<|endofturn|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "110491": {
173
+ "content": "<repo_name>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "110492": {
181
+ "content": "<file_sep>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "110493": {
189
+ "content": "<issue_start>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "110494": {
197
+ "content": "<issue_comment>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "110495": {
205
+ "content": "<issue_closed>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "110496": {
213
+ "content": "<jupyter_start>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "110497": {
221
+ "content": "<jupyter_text>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "110498": {
229
+ "content": "<jupyter_code>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "110499": {
237
+ "content": "<jupyter_output>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "110500": {
245
+ "content": "<jupyter_script>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "110501": {
253
+ "content": "<empty_output>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "110502": {
261
+ "content": "<code_to_intermediate>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "110503": {
269
+ "content": "<intermediate_to_code>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "110504": {
277
+ "content": "<pr>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "110505": {
285
+ "content": "<pr_status>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "110506": {
293
+ "content": "<pr_is_merged>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "110507": {
301
+ "content": "<pr_base>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "110508": {
309
+ "content": "<pr_file>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "110509": {
317
+ "content": "<pr_base_code>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "110510": {
325
+ "content": "<pr_diff>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "110511": {
333
+ "content": "<pr_diff_hunk>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "110512": {
341
+ "content": "<pr_comment>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "110513": {
349
+ "content": "<pr_event_id>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "110514": {
357
+ "content": "<pr_review>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "110515": {
365
+ "content": "<pr_review_state>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "110516": {
373
+ "content": "<pr_review_comment>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "110517": {
381
+ "content": "<pr_in_reply_to_review_id>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "110518": {
389
+ "content": "<pr_in_reply_to_comment_id>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "110519": {
397
+ "content": "<pr_diff_hunk_comment_line>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "110520": {
405
+ "content": "<NAME>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "110521": {
413
+ "content": "<EMAIL>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "110522": {
421
+ "content": "<KEY>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "110523": {
429
+ "content": "<PASSWORD>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ }
436
+ },
437
+ "additional_special_tokens": [
438
+ "<|endoftext|>",
439
+ "<|fim_prefix|>",
440
+ "<|fim_middle|>",
441
+ "<|fim_suffix|>",
442
+ "<|endofprompt|>",
443
+ "<|_unuse_missing_100256|>",
444
+ "<|_unuse_missing_100261|>",
445
+ "<|_unuse_missing_100262|>",
446
+ "<|_unuse_missing_100263|>",
447
+ "<|_unuse_missing_100264|>",
448
+ "<|_unuse_missing_100265|>",
449
+ "<|_unuse_missing_100266|>",
450
+ "<|_unuse_missing_100267|>",
451
+ "<|_unuse_missing_100268|>",
452
+ "<|_unuse_missing_100269|>",
453
+ "<|_unuse_missing_100270|>",
454
+ "<|_unuse_missing_100271|>",
455
+ "<|im_start|>",
456
+ "<|im_end|>",
457
+ "<|stop|>",
458
+ "<|endofturn|>",
459
+ "<repo_name>",
460
+ "<file_sep>",
461
+ "<issue_start>",
462
+ "<issue_comment>",
463
+ "<issue_closed>",
464
+ "<jupyter_start>",
465
+ "<jupyter_text>",
466
+ "<jupyter_code>",
467
+ "<jupyter_output>",
468
+ "<jupyter_script>",
469
+ "<empty_output>",
470
+ "<code_to_intermediate>",
471
+ "<intermediate_to_code>",
472
+ "<pr>",
473
+ "<pr_status>",
474
+ "<pr_is_merged>",
475
+ "<pr_base>",
476
+ "<pr_file>",
477
+ "<pr_base_code>",
478
+ "<pr_diff>",
479
+ "<pr_diff_hunk>",
480
+ "<pr_comment>",
481
+ "<pr_event_id>",
482
+ "<pr_review>",
483
+ "<pr_review_state>",
484
+ "<pr_review_comment>",
485
+ "<pr_in_reply_to_review_id>",
486
+ "<pr_in_reply_to_comment_id>",
487
+ "<pr_diff_hunk_comment_line>",
488
+ "<NAME>",
489
+ "<EMAIL>",
490
+ "<KEY>",
491
+ "<PASSWORD>"
492
+ ],
493
+ "bos_token": "<|endoftext|>",
494
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
495
+ "clean_up_tokenization_spaces": true,
496
+ "eos_token": "<|endofturn|>",
497
+ "extra_special_tokens": {},
498
+ "model_max_length": 1000000000000000019884624838656,
499
+ "pad_token": "<|endoftext|>",
500
+ "tokenizer_class": "GPT2Tokenizer",
501
+ "unk_token": "<|endoftext|>"
502
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bbd2538562d29d0ea8a0dc81d11411522bce0862261591b886509bfea955316
3
+ size 5624
vocab.json ADDED
The diff for this file is too large to render. See raw diff