phi4-lora-xaji0y6d-1742330134

This model is a fine-tuned version of microsoft/Phi-4-mini-instruct on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.01
num_epochs: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Perplexity
5.6626	1.48	10	5.8212	337.3485
5.4363	2.96	20	5.4409	230.6381
5.2185	4.32	30	5.2027	181.7434
4.9729	5.8	40	4.9270	137.9507
4.68	7.16	50	4.6071	100.1871
4.3242	8.64	60	4.2787	72.1430
4.0147	10.0	70	3.9536	52.1171
3.7066	11.48	80	3.6597	38.8469
3.3654	12.96	90	3.3835	29.4712
3.1883	14.32	100	3.1183	22.6075
2.8444	15.8	110	2.8578	17.4224
2.6168	17.16	120	2.6088	13.5819
2.3689	18.64	130	2.3749	10.7493
2.1379	20.0	140	2.1532	8.6119
1.8909	21.48	150	1.9458	6.9986
1.7022	22.96	160	1.7602	5.8135
1.5127	24.32	170	1.6061	4.9831
1.3942	25.8	180	1.4847	4.4133
1.3053	27.16	190	1.3923	4.0240
1.2177	28.64	200	1.3193	3.7405
1.1161	30.0	210	1.2557	3.5101
1.1293	31.48	220	1.2023	3.3275
1.0622	32.96	230	1.1562	3.1778
1.015	34.32	240	1.1164	3.0536
0.9539	35.8	250	1.0830	2.9533
0.9387	37.16	260	1.0552	2.8725
0.8819	38.64	270	1.0340	2.8121
0.9162	40.0	280	1.0178	2.7670
0.8912	41.48	290	1.0074	2.7384
0.8641	42.96	300	1.0010	2.7209