danita
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,13 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
|
|
|
4 |
---
|
5 |
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
-
|
|
|
9 |
|
10 |
|
11 |
|
@@ -13,7 +15,6 @@ tags: []
|
|
13 |
|
14 |
### Model Description
|
15 |
|
16 |
-
<!-- Provide a longer summary of what this model is. -->
|
17 |
|
18 |
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
19 |
|
@@ -92,8 +93,29 @@ Use the code below to get started with the model.
|
|
92 |
|
93 |
#### Training Hyperparameters
|
94 |
|
95 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
#### Speeds, Sizes, Times [optional]
|
98 |
|
99 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
datasets:
|
4 |
+
- xfordanita/code-summary-java
|
5 |
---
|
6 |
|
7 |
# Model Card for Model ID
|
8 |
|
9 |
+
This model is a fine-tuned version of **codellama/CodeLlama-7b-hf** on the **QLoRA** by using the method **PEFT** with library..
|
10 |
+
|
11 |
|
12 |
|
13 |
|
|
|
15 |
|
16 |
### Model Description
|
17 |
|
|
|
18 |
|
19 |
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
20 |
|
|
|
93 |
|
94 |
#### Training Hyperparameters
|
95 |
|
|
|
96 |
|
97 |
+
Training on Free Kaggle GPU 2*(15GB VRAM) with the following params:
|
98 |
+
|
99 |
+
```py
|
100 |
+
training_arguments = TrainingArguments(
|
101 |
+
output_dir='./results',
|
102 |
+
num_train_epochs=8,
|
103 |
+
per_device_train_batch_size=4,
|
104 |
+
gradient_accumulation_steps=2,
|
105 |
+
optim="paged_adamw_32bit",
|
106 |
+
save_steps=0,
|
107 |
+
logging_steps=10,
|
108 |
+
learning_rate=2e-4,
|
109 |
+
weight_decay=0.1, # Utilisation d'une valeur plus élevée pour la régularisation L2
|
110 |
+
fp16=True,
|
111 |
+
max_grad_norm=1.0, # Réduire la taille maximale des gradients pour éviter les explosions de gradients
|
112 |
+
max_steps=-1,
|
113 |
+
warmup_ratio=0.1, # Augmentation du ratio de warmup
|
114 |
+
group_by_length=True,
|
115 |
+
lr_scheduler_type="constant", # Utilisation d'un taux d'apprentissage constant
|
116 |
+
report_to="tensorboard"
|
117 |
+
)
|
118 |
+
```
|
119 |
#### Speeds, Sizes, Times [optional]
|
120 |
|
121 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|