End of training

286565d verified 4 months ago

4.82 kB

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task4-option1_small-deepseek-coder-1.3b-base-ddp-8lr
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task4-option1_small-deepseek-coder-1.3b-base-ddp-8lr

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0634

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0008
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|
	\| 0.2003 \| 0.2001 \| 629 \| 0.1249 \|
	\| 0.1241 \| 0.4001 \| 1258 \| 0.1045 \|
	\| 0.1127 \| 0.6002 \| 1887 \| 0.1060 \|
	\| 0.1028 \| 0.8003 \| 2516 \| 0.0979 \|
	\| 0.1004 \| 1.0003 \| 3145 \| 0.1018 \|
	\| 0.0966 \| 1.2004 \| 3774 \| 0.0946 \|
	\| 0.0946 \| 1.4004 \| 4403 \| 0.0901 \|
	\| 0.0929 \| 1.6005 \| 5032 \| 0.0855 \|
	\| 0.0924 \| 1.8006 \| 5661 \| 0.0895 \|
	\| 0.0888 \| 2.0006 \| 6290 \| 0.0892 \|
	\| 0.0898 \| 2.2007 \| 6919 \| 0.0878 \|
	\| 0.0869 \| 2.4008 \| 7548 \| 0.0862 \|
	\| 0.0866 \| 2.6008 \| 8177 \| 0.0839 \|
	\| 0.0854 \| 2.8009 \| 8806 \| 0.0835 \|
	\| 0.0854 \| 3.0010 \| 9435 \| 0.0808 \|
	\| 0.0809 \| 3.2010 \| 10064 \| 0.0824 \|
	\| 0.0811 \| 3.4011 \| 10693 \| 0.0800 \|
	\| 0.0799 \| 3.6011 \| 11322 \| 0.0830 \|
	\| 0.0824 \| 3.8012 \| 11951 \| 0.0813 \|
	\| 0.08 \| 4.0013 \| 12580 \| 0.0796 \|
	\| 0.078 \| 4.2013 \| 13209 \| 0.0776 \|
	\| 0.0757 \| 4.4014 \| 13838 \| 0.0733 \|
	\| 0.0771 \| 4.6015 \| 14467 \| 0.0740 \|
	\| 0.0761 \| 4.8015 \| 15096 \| 0.0723 \|
	\| 0.0748 \| 5.0016 \| 15725 \| 0.0774 \|
	\| 0.0756 \| 5.2017 \| 16354 \| 0.0746 \|
	\| 0.0746 \| 5.4017 \| 16983 \| 0.0748 \|
	\| 0.0722 \| 5.6018 \| 17612 \| 0.0728 \|
	\| 0.0731 \| 5.8018 \| 18241 \| 0.0748 \|
	\| 0.072 \| 6.0019 \| 18870 \| 0.0716 \|
	\| 0.071 \| 6.2020 \| 19499 \| 0.0710 \|
	\| 0.0692 \| 6.4020 \| 20128 \| 0.0711 \|
	\| 0.0699 \| 6.6021 \| 20757 \| 0.0699 \|
	\| 0.0689 \| 6.8022 \| 21386 \| 0.0698 \|
	\| 0.0694 \| 7.0022 \| 22015 \| 0.0683 \|
	\| 0.0674 \| 7.2023 \| 22644 \| 0.0695 \|
	\| 0.0666 \| 7.4024 \| 23273 \| 0.0685 \|
	\| 0.0675 \| 7.6024 \| 23902 \| 0.0672 \|
	\| 0.0658 \| 7.8025 \| 24531 \| 0.0672 \|
	\| 0.0658 \| 8.0025 \| 25160 \| 0.0666 \|
	\| 0.0641 \| 8.2026 \| 25789 \| 0.0658 \|
	\| 0.063 \| 8.4027 \| 26418 \| 0.0654 \|
	\| 0.0642 \| 8.6027 \| 27047 \| 0.0655 \|
	\| 0.0633 \| 8.8028 \| 27676 \| 0.0668 \|
	\| 0.0641 \| 9.0029 \| 28305 \| 0.0669 \|
	\| 0.0625 \| 9.2029 \| 28934 \| 0.0661 \|
	\| 0.0615 \| 9.4030 \| 29563 \| 0.0653 \|
	\| 0.0605 \| 9.6031 \| 30192 \| 0.0660 \|
	\| 0.0615 \| 9.8031 \| 30821 \| 0.0648 \|
	\| 0.0613 \| 10.0032 \| 31450 \| 0.0644 \|
	\| 0.0591 \| 10.2032 \| 32079 \| 0.0645 \|
	\| 0.0596 \| 10.4033 \| 32708 \| 0.0638 \|
	\| 0.0593 \| 10.6034 \| 33337 \| 0.0647 \|
	\| 0.0593 \| 10.8034 \| 33966 \| 0.0631 \|
	\| 0.0599 \| 11.0035 \| 34595 \| 0.0634 \|
	\| 0.0584 \| 11.2036 \| 35224 \| 0.0634 \|
	\| 0.0584 \| 11.4036 \| 35853 \| 0.0636 \|
	\| 0.0585 \| 11.6037 \| 36482 \| 0.0635 \|
	\| 0.0574 \| 11.8038 \| 37111 \| 0.0634 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task4-option1_small-deepseek-coder-1.3b-base-ddp-8lr
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task4-option1_small-deepseek-coder-1.3b-base-ddp-8lr

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0634

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0008
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|
	\| 0.2003 \| 0.2001 \| 629 \| 0.1249 \|
	\| 0.1241 \| 0.4001 \| 1258 \| 0.1045 \|
	\| 0.1127 \| 0.6002 \| 1887 \| 0.1060 \|
	\| 0.1028 \| 0.8003 \| 2516 \| 0.0979 \|
	\| 0.1004 \| 1.0003 \| 3145 \| 0.1018 \|
	\| 0.0966 \| 1.2004 \| 3774 \| 0.0946 \|
	\| 0.0946 \| 1.4004 \| 4403 \| 0.0901 \|
	\| 0.0929 \| 1.6005 \| 5032 \| 0.0855 \|
	\| 0.0924 \| 1.8006 \| 5661 \| 0.0895 \|
	\| 0.0888 \| 2.0006 \| 6290 \| 0.0892 \|
	\| 0.0898 \| 2.2007 \| 6919 \| 0.0878 \|
	\| 0.0869 \| 2.4008 \| 7548 \| 0.0862 \|
	\| 0.0866 \| 2.6008 \| 8177 \| 0.0839 \|
	\| 0.0854 \| 2.8009 \| 8806 \| 0.0835 \|
	\| 0.0854 \| 3.0010 \| 9435 \| 0.0808 \|
	\| 0.0809 \| 3.2010 \| 10064 \| 0.0824 \|
	\| 0.0811 \| 3.4011 \| 10693 \| 0.0800 \|
	\| 0.0799 \| 3.6011 \| 11322 \| 0.0830 \|
	\| 0.0824 \| 3.8012 \| 11951 \| 0.0813 \|
	\| 0.08 \| 4.0013 \| 12580 \| 0.0796 \|
	\| 0.078 \| 4.2013 \| 13209 \| 0.0776 \|
	\| 0.0757 \| 4.4014 \| 13838 \| 0.0733 \|
	\| 0.0771 \| 4.6015 \| 14467 \| 0.0740 \|
	\| 0.0761 \| 4.8015 \| 15096 \| 0.0723 \|
	\| 0.0748 \| 5.0016 \| 15725 \| 0.0774 \|
	\| 0.0756 \| 5.2017 \| 16354 \| 0.0746 \|
	\| 0.0746 \| 5.4017 \| 16983 \| 0.0748 \|
	\| 0.0722 \| 5.6018 \| 17612 \| 0.0728 \|
	\| 0.0731 \| 5.8018 \| 18241 \| 0.0748 \|
	\| 0.072 \| 6.0019 \| 18870 \| 0.0716 \|
	\| 0.071 \| 6.2020 \| 19499 \| 0.0710 \|
	\| 0.0692 \| 6.4020 \| 20128 \| 0.0711 \|
	\| 0.0699 \| 6.6021 \| 20757 \| 0.0699 \|
	\| 0.0689 \| 6.8022 \| 21386 \| 0.0698 \|
	\| 0.0694 \| 7.0022 \| 22015 \| 0.0683 \|
	\| 0.0674 \| 7.2023 \| 22644 \| 0.0695 \|
	\| 0.0666 \| 7.4024 \| 23273 \| 0.0685 \|
	\| 0.0675 \| 7.6024 \| 23902 \| 0.0672 \|
	\| 0.0658 \| 7.8025 \| 24531 \| 0.0672 \|
	\| 0.0658 \| 8.0025 \| 25160 \| 0.0666 \|
	\| 0.0641 \| 8.2026 \| 25789 \| 0.0658 \|
	\| 0.063 \| 8.4027 \| 26418 \| 0.0654 \|
	\| 0.0642 \| 8.6027 \| 27047 \| 0.0655 \|
	\| 0.0633 \| 8.8028 \| 27676 \| 0.0668 \|
	\| 0.0641 \| 9.0029 \| 28305 \| 0.0669 \|
	\| 0.0625 \| 9.2029 \| 28934 \| 0.0661 \|
	\| 0.0615 \| 9.4030 \| 29563 \| 0.0653 \|
	\| 0.0605 \| 9.6031 \| 30192 \| 0.0660 \|
	\| 0.0615 \| 9.8031 \| 30821 \| 0.0648 \|
	\| 0.0613 \| 10.0032 \| 31450 \| 0.0644 \|
	\| 0.0591 \| 10.2032 \| 32079 \| 0.0645 \|
	\| 0.0596 \| 10.4033 \| 32708 \| 0.0638 \|
	\| 0.0593 \| 10.6034 \| 33337 \| 0.0647 \|
	\| 0.0593 \| 10.8034 \| 33966 \| 0.0631 \|
	\| 0.0599 \| 11.0035 \| 34595 \| 0.0634 \|
	\| 0.0584 \| 11.2036 \| 35224 \| 0.0634 \|
	\| 0.0584 \| 11.4036 \| 35853 \| 0.0636 \|
	\| 0.0585 \| 11.6037 \| 36482 \| 0.0635 \|
	\| 0.0574 \| 11.8038 \| 37111 \| 0.0634 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0