Spaces:
Sleeping
Sleeping
title: Tortoise TTS API | |
emoji: 🦀 | |
colorFrom: yellow | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.23.1 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS | |
tags: | |
- tortoise-tts | |
- text-to-speech | |
- voice-cloning | |
- gradio | |
- fastapi | |
# Tortoise TTS with Voice Cloning | |
A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS. | |
## Description | |
This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either: | |
- Uploading your own voice sample for cloning | |
- Recording your voice directly in the browser | |
- Selecting from a variety of preset voices | |
The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization. | |
## How to Use | |
### Web Interface | |
1. Enter the text you want to convert to speech | |
2. Choose one of the following voice options: | |
- Upload a voice sample audio file (WAV format recommended) | |
- Record your voice using your microphone | |
- Select a preset voice from the dropdown menu | |
3. Click "Generate Speech" | |
4. Listen to or download the generated audio | |
### API Endpoints | |
The app also provides REST API endpoints for programmatic access: | |
1. **Voice File TTS** - `/api/tts_with_voice_file/` | |
- POST request with: | |
- `text`: Text to convert to speech (required) | |
- `voice_file`: Audio file for voice cloning (optional) | |
- `preset_voice`: Name of preset voice (optional, defaults to "random") | |
2. **Preset Voice TTS** - `/api/tts_with_preset/` | |
- POST request with: | |
- `text`: Text to convert to speech (required) | |
- `preset_voice`: Name of preset voice (required) | |
### Python Example | |
```python | |
import requests | |
# Using preset voice | |
response = requests.post( | |
"https://your-space-name.hf.space/api/tts_with_preset/", | |
data={"text": "Hello, this is a test.", "preset_voice": "tom"} | |
) | |
# Save the audio file | |
with open("output.wav", "wb") as f: | |
f.write(response.content) | |
``` | |
## Technical Details | |
This app leverages: | |
- **Tortoise-TTS**: State-of-the-art text-to-speech model | |
- **Gradio**: For the intuitive user interface | |
- **FastAPI**: For the API endpoints | |
- **Zero-GPU**: For efficient GPU utilization on Hugging Face Spaces | |
## Limitations | |
- Text generation may take some time (30-60 seconds) depending on text length | |
- Voice cloning quality depends on the clarity and length of the provided sample | |
- For best results, provide voice samples with clear speech and minimal background noise | |
## Credits | |
This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing: | |
``` | |
@misc{tortoise-tts, | |
author = {James Betker}, | |
title = {Tortoise-TTS: A Multi-Voice TTS System}, | |
year = {2022}, | |
publisher = {GitHub}, | |
journal = {GitHub repository}, | |
howpublished = {\url{https://github.com/neonbjb/tortoise-tts}} | |
} | |
``` | |
## License | |
This project is available under the Apache-2.0 License. | |