fine-tuning-dolphin-mistral-with-webglm-qa-with-lora_1

4880144 verified about 1 year ago

5.13 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: cognitivecomputations/dolphin-2.8-mistral-7b-v02
	model-index:
	- name: fine-tuning-dolphin-mistral-with-webglm-qa-with-lora_1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# fine-tuning-dolphin-mistral-with-webglm-qa-with-lora_1

	This model is a fine-tuned version of [cognitivecomputations/dolphin-2.8-mistral-7b-v02](https://huggingface.co/cognitivecomputations/dolphin-2.8-mistral-7b-v02) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2999

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 5
	- total_train_batch_size: 10
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 60
	- training_steps: 700
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.7558 \| 0.16 \| 10 \| 1.4842 \|
	\| 1.4966 \| 0.32 \| 20 \| 1.3367 \|
	\| 1.2328 \| 0.48 \| 30 \| 1.1282 \|
	\| 0.9873 \| 0.64 \| 40 \| 1.0817 \|
	\| 0.9661 \| 0.8 \| 50 \| 0.9967 \|
	\| 0.8808 \| 0.96 \| 60 \| 0.8844 \|
	\| 0.7455 \| 1.13 \| 70 \| 0.7337 \|
	\| 0.6018 \| 1.29 \| 80 \| 0.6164 \|
	\| 0.4899 \| 1.45 \| 90 \| 0.5440 \|
	\| 0.4402 \| 1.61 \| 100 \| 0.4971 \|
	\| 0.4154 \| 1.77 \| 110 \| 0.4555 \|
	\| 0.4025 \| 1.93 \| 120 \| 0.4238 \|
	\| 0.3992 \| 2.09 \| 130 \| 0.4007 \|
	\| 0.3585 \| 2.25 \| 140 \| 0.3862 \|
	\| 0.3369 \| 2.41 \| 150 \| 0.3666 \|
	\| 0.3328 \| 2.57 \| 160 \| 0.3537 \|
	\| 0.3216 \| 2.73 \| 170 \| 0.3423 \|
	\| 0.2859 \| 2.89 \| 180 \| 0.3303 \|
	\| 0.2967 \| 3.05 \| 190 \| 0.3211 \|
	\| 0.2933 \| 3.22 \| 200 \| 0.3114 \|
	\| 0.2716 \| 3.38 \| 210 \| 0.3097 \|
	\| 0.255 \| 3.54 \| 220 \| 0.3053 \|
	\| 0.2731 \| 3.7 \| 230 \| 0.2990 \|
	\| 0.2729 \| 3.86 \| 240 \| 0.2972 \|
	\| 0.2701 \| 4.02 \| 250 \| 0.3030 \|
	\| 0.2558 \| 4.18 \| 260 \| 0.3042 \|
	\| 0.2612 \| 4.34 \| 270 \| 0.3301 \|
	\| 0.3048 \| 4.5 \| 280 \| 0.4564 \|
	\| 0.5437 \| 4.66 \| 290 \| 0.7938 \|
	\| 1.5888 \| 4.82 \| 300 \| 1.5418 \|
	\| 0.6588 \| 4.98 \| 310 \| 0.4630 \|
	\| 0.5345 \| 5.14 \| 320 \| 0.9088 \|
	\| 1.1475 \| 5.31 \| 330 \| 1.6381 \|
	\| 1.6442 \| 5.47 \| 340 \| 2.0495 \|
	\| 2.2517 \| 5.63 \| 350 \| 1.7558 \|
	\| 0.9492 \| 5.79 \| 360 \| 0.5187 \|
	\| 0.3727 \| 5.95 \| 370 \| 0.3763 \|
	\| 0.3139 \| 6.11 \| 380 \| 0.3376 \|
	\| 0.2896 \| 6.27 \| 390 \| 0.3195 \|
	\| 0.283 \| 6.43 \| 400 \| 0.3106 \|
	\| 0.2646 \| 6.59 \| 410 \| 0.3105 \|
	\| 0.2674 \| 6.75 \| 420 \| 0.3256 \|
	\| 0.3482 \| 6.91 \| 430 \| 0.4016 \|
	\| 0.4193 \| 7.07 \| 440 \| 0.6300 \|
	\| 0.7397 \| 7.23 \| 450 \| 1.0617 \|
	\| 1.1954 \| 7.4 \| 460 \| 1.6157 \|
	\| 1.6177 \| 7.56 \| 470 \| 1.8019 \|
	\| 1.2996 \| 7.72 \| 480 \| 0.9151 \|
	\| 0.6605 \| 7.88 \| 490 \| 0.5433 \|
	\| 0.416 \| 8.04 \| 500 \| 0.4012 \|
	\| 0.3412 \| 8.2 \| 510 \| 0.3685 \|
	\| 0.3322 \| 8.36 \| 520 \| 0.3928 \|
	\| 0.3516 \| 8.52 \| 530 \| 0.3641 \|
	\| 0.3406 \| 8.68 \| 540 \| 0.4061 \|
	\| 0.3772 \| 8.84 \| 550 \| 0.4145 \|
	\| 0.3695 \| 9.0 \| 560 \| 0.5453 \|
	\| 0.5824 \| 9.16 \| 570 \| 0.7332 \|
	\| 0.5139 \| 9.32 \| 580 \| 0.4839 \|
	\| 0.3798 \| 9.49 \| 590 \| 0.3758 \|
	\| 0.319 \| 9.65 \| 600 \| 0.3438 \|
	\| 0.3082 \| 9.81 \| 610 \| 0.3301 \|
	\| 0.3017 \| 9.97 \| 620 \| 0.3225 \|
	\| 0.2862 \| 10.13 \| 630 \| 0.3156 \|
	\| 0.2586 \| 10.29 \| 640 \| 0.3109 \|
	\| 0.2878 \| 10.45 \| 650 \| 0.3082 \|
	\| 0.2766 \| 10.61 \| 660 \| 0.3056 \|
	\| 0.2834 \| 10.77 \| 670 \| 0.3042 \|
	\| 0.2513 \| 10.93 \| 680 \| 0.3020 \|
	\| 0.2762 \| 11.09 \| 690 \| 0.3007 \|
	\| 0.28 \| 11.25 \| 700 \| 0.2999 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.0.0
	- Datasets 2.15.0
	- Tokenizers 0.15.0

	---
	license: apache-2.0
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: cognitivecomputations/dolphin-2.8-mistral-7b-v02
	model-index:
	- name: fine-tuning-dolphin-mistral-with-webglm-qa-with-lora_1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# fine-tuning-dolphin-mistral-with-webglm-qa-with-lora_1

	This model is a fine-tuned version of [cognitivecomputations/dolphin-2.8-mistral-7b-v02](https://huggingface.co/cognitivecomputations/dolphin-2.8-mistral-7b-v02) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2999

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 5
	- total_train_batch_size: 10
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 60
	- training_steps: 700
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.7558 \| 0.16 \| 10 \| 1.4842 \|
	\| 1.4966 \| 0.32 \| 20 \| 1.3367 \|
	\| 1.2328 \| 0.48 \| 30 \| 1.1282 \|
	\| 0.9873 \| 0.64 \| 40 \| 1.0817 \|
	\| 0.9661 \| 0.8 \| 50 \| 0.9967 \|
	\| 0.8808 \| 0.96 \| 60 \| 0.8844 \|
	\| 0.7455 \| 1.13 \| 70 \| 0.7337 \|
	\| 0.6018 \| 1.29 \| 80 \| 0.6164 \|
	\| 0.4899 \| 1.45 \| 90 \| 0.5440 \|
	\| 0.4402 \| 1.61 \| 100 \| 0.4971 \|
	\| 0.4154 \| 1.77 \| 110 \| 0.4555 \|
	\| 0.4025 \| 1.93 \| 120 \| 0.4238 \|
	\| 0.3992 \| 2.09 \| 130 \| 0.4007 \|
	\| 0.3585 \| 2.25 \| 140 \| 0.3862 \|
	\| 0.3369 \| 2.41 \| 150 \| 0.3666 \|
	\| 0.3328 \| 2.57 \| 160 \| 0.3537 \|
	\| 0.3216 \| 2.73 \| 170 \| 0.3423 \|
	\| 0.2859 \| 2.89 \| 180 \| 0.3303 \|
	\| 0.2967 \| 3.05 \| 190 \| 0.3211 \|
	\| 0.2933 \| 3.22 \| 200 \| 0.3114 \|
	\| 0.2716 \| 3.38 \| 210 \| 0.3097 \|
	\| 0.255 \| 3.54 \| 220 \| 0.3053 \|
	\| 0.2731 \| 3.7 \| 230 \| 0.2990 \|
	\| 0.2729 \| 3.86 \| 240 \| 0.2972 \|
	\| 0.2701 \| 4.02 \| 250 \| 0.3030 \|
	\| 0.2558 \| 4.18 \| 260 \| 0.3042 \|
	\| 0.2612 \| 4.34 \| 270 \| 0.3301 \|
	\| 0.3048 \| 4.5 \| 280 \| 0.4564 \|
	\| 0.5437 \| 4.66 \| 290 \| 0.7938 \|
	\| 1.5888 \| 4.82 \| 300 \| 1.5418 \|
	\| 0.6588 \| 4.98 \| 310 \| 0.4630 \|
	\| 0.5345 \| 5.14 \| 320 \| 0.9088 \|
	\| 1.1475 \| 5.31 \| 330 \| 1.6381 \|
	\| 1.6442 \| 5.47 \| 340 \| 2.0495 \|
	\| 2.2517 \| 5.63 \| 350 \| 1.7558 \|
	\| 0.9492 \| 5.79 \| 360 \| 0.5187 \|
	\| 0.3727 \| 5.95 \| 370 \| 0.3763 \|
	\| 0.3139 \| 6.11 \| 380 \| 0.3376 \|
	\| 0.2896 \| 6.27 \| 390 \| 0.3195 \|
	\| 0.283 \| 6.43 \| 400 \| 0.3106 \|
	\| 0.2646 \| 6.59 \| 410 \| 0.3105 \|
	\| 0.2674 \| 6.75 \| 420 \| 0.3256 \|
	\| 0.3482 \| 6.91 \| 430 \| 0.4016 \|
	\| 0.4193 \| 7.07 \| 440 \| 0.6300 \|
	\| 0.7397 \| 7.23 \| 450 \| 1.0617 \|
	\| 1.1954 \| 7.4 \| 460 \| 1.6157 \|
	\| 1.6177 \| 7.56 \| 470 \| 1.8019 \|
	\| 1.2996 \| 7.72 \| 480 \| 0.9151 \|
	\| 0.6605 \| 7.88 \| 490 \| 0.5433 \|
	\| 0.416 \| 8.04 \| 500 \| 0.4012 \|
	\| 0.3412 \| 8.2 \| 510 \| 0.3685 \|
	\| 0.3322 \| 8.36 \| 520 \| 0.3928 \|
	\| 0.3516 \| 8.52 \| 530 \| 0.3641 \|
	\| 0.3406 \| 8.68 \| 540 \| 0.4061 \|
	\| 0.3772 \| 8.84 \| 550 \| 0.4145 \|
	\| 0.3695 \| 9.0 \| 560 \| 0.5453 \|
	\| 0.5824 \| 9.16 \| 570 \| 0.7332 \|
	\| 0.5139 \| 9.32 \| 580 \| 0.4839 \|
	\| 0.3798 \| 9.49 \| 590 \| 0.3758 \|
	\| 0.319 \| 9.65 \| 600 \| 0.3438 \|
	\| 0.3082 \| 9.81 \| 610 \| 0.3301 \|
	\| 0.3017 \| 9.97 \| 620 \| 0.3225 \|
	\| 0.2862 \| 10.13 \| 630 \| 0.3156 \|
	\| 0.2586 \| 10.29 \| 640 \| 0.3109 \|
	\| 0.2878 \| 10.45 \| 650 \| 0.3082 \|
	\| 0.2766 \| 10.61 \| 660 \| 0.3056 \|
	\| 0.2834 \| 10.77 \| 670 \| 0.3042 \|
	\| 0.2513 \| 10.93 \| 680 \| 0.3020 \|
	\| 0.2762 \| 11.09 \| 690 \| 0.3007 \|
	\| 0.28 \| 11.25 \| 700 \| 0.2999 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.0.0
	- Datasets 2.15.0
	- Tokenizers 0.15.0