Delta-Vector commited on
Commit
f8df87d
·
verified ·
1 Parent(s): 3fa6f37

Model save

Browse files
Files changed (1) hide show
  1. README.md +191 -0
README.md ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: NewEden/32B-inst
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ datasets:
8
+ - NewEden/RP-logs-V2-Experimental-prefixed
9
+ - NewEden/Creative_Writing-Complexity
10
+ - NewEden/Discord-Filtered
11
+ - NewEden/DeepseekRP-Filtered
12
+ - NewEden/Storium-Prefixed-Clean
13
+ - NewEden/Basket-Weaving-Filtered
14
+ - NewEden/LIMARP-Complexity
15
+ - NewEden/Misc-Data-Sharegpt-Prefixed
16
+ - NewEden/BlueSky-10K-Complexity
17
+ - NewEden/OpenCAI-ShareGPT
18
+ - NewEden/Basket-Weaving-Filtered
19
+ - PocketDoc/Dans-Personamaxx-VN
20
+ - PocketDoc/Dans-Kinomaxx-VanillaBackrooms
21
+ model-index:
22
+ - name: 32b-rp
23
+ results: []
24
+ ---
25
+
26
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
27
+ should probably proofread and complete it, then remove this comment. -->
28
+
29
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
30
+ <details><summary>See axolotl config</summary>
31
+
32
+ axolotl version: `0.8.0.dev0`
33
+ ```yaml
34
+ base_model: NewEden/32B-inst
35
+ model_type: AutoModelForCausalLM
36
+ tokenizer_type: AutoTokenizer
37
+
38
+ hub_model_id: NewEden/32b-rp
39
+ hub_strategy: "all_checkpoints"
40
+ push_dataset_to_hub:
41
+ hf_use_auth_token: true
42
+
43
+ plugins:
44
+ - axolotl.integrations.liger.LigerPlugin
45
+ - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
46
+ liger_rope: true
47
+ liger_rms_norm: true
48
+ liger_layer_norm: true
49
+ liger_glu_activation: true
50
+ liger_fused_linear_cross_entropy: false
51
+ cut_cross_entropy: true
52
+
53
+ load_in_8bit: false
54
+ load_in_4bit: false
55
+ strict: false
56
+
57
+ datasets:
58
+ - path: NewEden/RP-logs-V2-Experimental-prefixed
59
+ type: dan-chat-advanced
60
+ - path: NewEden/Creative_Writing-Complexity
61
+ type: dan-chat-advanced
62
+ - path: NewEden/Discord-Filtered
63
+ type: dan-chat-advanced
64
+ - path: NewEden/DeepseekRP-Filtered
65
+ type: dan-chat-advanced
66
+ - path: NewEden/Storium-Prefixed-Clean
67
+ type: dan-chat-advanced
68
+ - path: NewEden/Basket-Weaving-Filtered
69
+ type: dan-chat-advanced
70
+ - path: NewEden/LIMARP-Complexity
71
+ type: dan-chat-advanced
72
+ - path: NewEden/Misc-Data-Sharegpt-Prefixed
73
+ type: dan-chat-advanced
74
+ - path: NewEden/BlueSky-10K-Complexity
75
+ type: dan-chat-advanced
76
+ - path: NewEden/OpenCAI-ShareGPT
77
+ type: dan-chat-advanced
78
+ - path: NewEden/Basket-Weaving-Filtered
79
+ type: dan-chat-advanced
80
+ - path: PocketDoc/Dans-Personamaxx-VN
81
+ type: dan-chat-advanced
82
+ - path: PocketDoc/Dans-Kinomaxx-VanillaBackrooms
83
+ type: dan-chat-advanced
84
+ dataset_prepared_path: prepared_data
85
+ val_set_size: 0.0
86
+ output_dir: ./qwq-inst
87
+
88
+ sequence_len: 32768
89
+ sample_packing: true
90
+ pad_to_sequence_len: true
91
+
92
+ # adapter: lora
93
+ # lora_model_dir:
94
+ # lora_r: 128
95
+ # lora_alpha: 16
96
+ # lora_dropout: 0.05
97
+ # lora_target_modules:
98
+ # - gate_proj
99
+ # - down_proj
100
+ # - up_proj
101
+ # - q_proj
102
+ # - v_proj
103
+ # - k_proj
104
+ # - o_proj
105
+
106
+ wandb_project: qwq
107
+ wandb_entity:
108
+ wandb_watch:
109
+ wandb_name: rp-attempt-03
110
+ wandb_log_model:
111
+
112
+ gradient_accumulation_steps: 2
113
+ micro_batch_size: 2
114
+ num_epochs: 4
115
+ optimizer: adamw_bnb_8bit
116
+ lr_scheduler: cosine
117
+ learning_rate: 2.5e-5
118
+ max_grad_norm: 1.0
119
+
120
+ train_on_inputs: false
121
+ group_by_length: false
122
+ bf16: auto
123
+ fp16:
124
+ tf32: false
125
+
126
+ gradient_checkpointing: unsloth
127
+ early_stopping_patience:
128
+ resume_from_checkpoint:
129
+ local_rank:
130
+ logging_steps: 1
131
+ xformers_attention:
132
+ flash_attention: true
133
+
134
+ warmup_steps: 40
135
+ saves_per_epoch: 2
136
+ debug:
137
+ deepspeed: deepspeed_configs/zero3_bf16.json
138
+ weight_decay: 0.02
139
+ fsdp:
140
+ fsdp_config:
141
+ special_tokens:
142
+
143
+ ```
144
+
145
+ </details><br>
146
+
147
+ # 32b-rp
148
+
149
+ This model is a fine-tuned version of [NewEden/32B-inst](https://huggingface.co/NewEden/32B-inst) on the NewEden/RP-logs-V2-Experimental-prefixed, the NewEden/Creative_Writing-Complexity, the NewEden/Discord-Filtered, the NewEden/DeepseekRP-Filtered, the NewEden/Storium-Prefixed-Clean, the NewEden/Basket-Weaving-Filtered, the NewEden/LIMARP-Complexity, the NewEden/Misc-Data-Sharegpt-Prefixed, the NewEden/BlueSky-10K-Complexity, the NewEden/OpenCAI-ShareGPT, the NewEden/Basket-Weaving-Filtered, the PocketDoc/Dans-Personamaxx-VN and the PocketDoc/Dans-Kinomaxx-VanillaBackrooms datasets.
150
+
151
+ ## Model description
152
+
153
+ More information needed
154
+
155
+ ## Intended uses & limitations
156
+
157
+ More information needed
158
+
159
+ ## Training and evaluation data
160
+
161
+ More information needed
162
+
163
+ ## Training procedure
164
+
165
+ ### Training hyperparameters
166
+
167
+ The following hyperparameters were used during training:
168
+ - learning_rate: 2.5e-05
169
+ - train_batch_size: 2
170
+ - eval_batch_size: 2
171
+ - seed: 42
172
+ - distributed_type: multi-GPU
173
+ - num_devices: 8
174
+ - gradient_accumulation_steps: 2
175
+ - total_train_batch_size: 32
176
+ - total_eval_batch_size: 16
177
+ - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
178
+ - lr_scheduler_type: cosine
179
+ - lr_scheduler_warmup_steps: 40
180
+ - num_epochs: 4.0
181
+
182
+ ### Training results
183
+
184
+
185
+
186
+ ### Framework versions
187
+
188
+ - Transformers 4.49.0
189
+ - Pytorch 2.6.0+cu124
190
+ - Datasets 3.2.0
191
+ - Tokenizers 0.21.0