flan-t5-base-vtssum

This model is a fine-tuned version of google/flan-t5-base on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
No log	1.0	39	0.1855	0.0057	40.0436	32.5871	38.0798	38.6749	15.0851
No log	2.0	78	0.1625	0.0057	43.4211	36.1317	41.3592	42.1222	15.0294
No log	3.0	117	0.1447	0.0057	45.0768	38.1098	43.1718	43.8482	15.0913
No log	4.0	156	0.1316	0.0057	47.1981	40.9958	45.503	46.1512	15.0805
No log	4.8746	190	0.1286	0.0057	47.6557	41.1977	45.8243	46.5471	15.2121