PEFT
Safetensors
llama
danita commited on
Commit
94baaee
·
verified ·
1 Parent(s): df281c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -4
README.md CHANGED
@@ -1,11 +1,13 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
4
  ---
5
 
6
  # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
 
9
 
10
 
11
 
@@ -13,7 +15,6 @@ tags: []
13
 
14
  ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
@@ -92,8 +93,29 @@ Use the code below to get started with the model.
92
 
93
  #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  #### Speeds, Sizes, Times [optional]
98
 
99
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
1
  ---
2
  library_name: transformers
3
+ datasets:
4
+ - xfordanita/code-summary-java
5
  ---
6
 
7
  # Model Card for Model ID
8
 
9
+ This model is a fine-tuned version of **codellama/CodeLlama-7b-hf** on the **QLoRA** by using the method **PEFT** with library..
10
+
11
 
12
 
13
 
 
15
 
16
  ### Model Description
17
 
 
18
 
19
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
20
 
 
93
 
94
  #### Training Hyperparameters
95
 
 
96
 
97
+ Training on Free Kaggle GPU 2*(15GB VRAM) with the following params:
98
+
99
+ ```py
100
+ training_arguments = TrainingArguments(
101
+ output_dir='./results',
102
+ num_train_epochs=8,
103
+ per_device_train_batch_size=4,
104
+ gradient_accumulation_steps=2,
105
+ optim="paged_adamw_32bit",
106
+ save_steps=0,
107
+ logging_steps=10,
108
+ learning_rate=2e-4,
109
+ weight_decay=0.1, # Utilisation d'une valeur plus élevée pour la régularisation L2
110
+ fp16=True,
111
+ max_grad_norm=1.0, # Réduire la taille maximale des gradients pour éviter les explosions de gradients
112
+ max_steps=-1,
113
+ warmup_ratio=0.1, # Augmentation du ratio de warmup
114
+ group_by_length=True,
115
+ lr_scheduler_type="constant", # Utilisation d'un taux d'apprentissage constant
116
+ report_to="tensorboard"
117
+ )
118
+ ```
119
  #### Speeds, Sizes, Times [optional]
120
 
121
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->