Command_RTC / README.md
RSHVR's picture
Update README.md
12d303c verified
|
raw
history blame
3.03 kB
---
title: Tortoise TTS API
emoji: 🦀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
tags:
- tortoise-tts
- text-to-speech
- voice-cloning
- gradio
- fastapi
---
# Tortoise TTS with Voice Cloning
A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.
## Description
This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:
- Uploading your own voice sample for cloning
- Recording your voice directly in the browser
- Selecting from a variety of preset voices
The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.
## How to Use
### Web Interface
1. Enter the text you want to convert to speech
2. Choose one of the following voice options:
- Upload a voice sample audio file (WAV format recommended)
- Record your voice using your microphone
- Select a preset voice from the dropdown menu
3. Click "Generate Speech"
4. Listen to or download the generated audio
### API Endpoints
The app also provides REST API endpoints for programmatic access:
1. **Voice File TTS** - `/api/tts_with_voice_file/`
- POST request with:
- `text`: Text to convert to speech (required)
- `voice_file`: Audio file for voice cloning (optional)
- `preset_voice`: Name of preset voice (optional, defaults to "random")
2. **Preset Voice TTS** - `/api/tts_with_preset/`
- POST request with:
- `text`: Text to convert to speech (required)
- `preset_voice`: Name of preset voice (required)
### Python Example
```python
import requests
# Using preset voice
response = requests.post(
"https://your-space-name.hf.space/api/tts_with_preset/",
data={"text": "Hello, this is a test.", "preset_voice": "tom"}
)
# Save the audio file
with open("output.wav", "wb") as f:
f.write(response.content)
```
## Technical Details
This app leverages:
- **Tortoise-TTS**: State-of-the-art text-to-speech model
- **Gradio**: For the intuitive user interface
- **FastAPI**: For the API endpoints
- **Zero-GPU**: For efficient GPU utilization on Hugging Face Spaces
## Limitations
- Text generation may take some time (30-60 seconds) depending on text length
- Voice cloning quality depends on the clarity and length of the provided sample
- For best results, provide voice samples with clear speech and minimal background noise
## Credits
This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:
```
@misc{tortoise-tts,
author = {James Betker},
title = {Tortoise-TTS: A Multi-Voice TTS System},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
}
```
## License
This project is available under the Apache-2.0 License.