---
license: cc-by-nc-4.0
base_model:
  - SWivid/F5-TTS
language:
- ami
- trv
- bnn
- pwn
- tay
- tsu
- tao
- dru
- xsy
- pyu
- szy
- ckv
- sxr
- ssf
- xnb
pipeline_tag: text-to-speech
---

# Model Card for f5-tts-hakka-finetune


## Model Details
F5-TTS finetune on all formosan data (ithuan, fb ilrdf dict, klokah) **without samples only one word**, using ipa as input. \
Only contains ithuan ami and trv part. \
g2p from this [repo](https://github.com/FormoSpeech/FormoG2P).

## Training Details
- learning rate: 0.00001
- batch size per gpu: 6400
- batch size type: frame
- max samples: 64
- grad accumulation steps: 1
- max grad norm: 1
- epochs: 210 (1704780 steps, current 1081600), after 1081600 loss rise
- num warmup updates: 27040

### Model Sources 
- **Repository:** [https://github.com/SWivid/F5-TTS](https://github.com/SWivid/F5-TTS)
- **Paper:** [https://arxiv.org/abs/2410.06885](https://arxiv.org/abs/2410.06885)

## Uses
please refer source repo

## Demo
[https://huggingface.co/spaces/ithuan/formosan-f5-tts](https://huggingface.co/spaces/ithuan/formosan-f5-tts)