ByT5 Song Lyrics

This is a Seq2Seq model trained on a karaoke dataset to predict syllables with pitch and timing from song lyrics.

As of writing, the model has only been trained on 1/2 of the full dataset. Expect the quality to improve later.

The Huggingface demo seems to produce outputs with a small sequence length. So what you see on the right will only make a prediction for the first two syllables.

Downloads last month
15
Safetensors
Model size
1.23B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support