--- license: cc-by-nc-4.0 base_model: - SWivid/F5-TTS language: - ami - trv - bnn - pwn - tay - tsu - tao - dru - xsy - pyu - szy - ckv - sxr - ssf - xnb pipeline_tag: text-to-speech --- # Model Card for f5-tts-hakka-finetune ## Model Details F5-TTS finetune on all formosan data (ithuan, fb ilrdf dict, klokah) **without samples only one word**, using ipa as input. \ Only contains ithuan ami and trv part. \ g2p from this [repo](https://github.com/FormoSpeech/FormoG2P). ## Training Details - learning rate: 0.00001 - batch size per gpu: 6400 - batch size type: frame - max samples: 64 - grad accumulation steps: 1 - max grad norm: 1 - epochs: 210 (1704780 steps, current 1081600), after 1081600 loss rise - num warmup updates: 27040 ### Model Sources - **Repository:** [https://github.com/SWivid/F5-TTS](https://github.com/SWivid/F5-TTS) - **Paper:** [https://arxiv.org/abs/2410.06885](https://arxiv.org/abs/2410.06885) ## Uses please refer source repo ## Demo [https://huggingface.co/spaces/ithuan/formosan-f5-tts](https://huggingface.co/spaces/ithuan/formosan-f5-tts)