Spaces:

RSHVR
/

Command_RTC

Sleeping

App Files Files Community

Command_RTC / README.md

RSHVR

Update README.md

12d303c verified 6 months ago

preview code

raw

history blame

3.03 kB

	---
	title: Tortoise TTS API
	emoji: 🦀
	colorFrom: yellow
	colorTo: purple
	sdk: gradio
	sdk_version: 5.23.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
	tags:
	- tortoise-tts
	- text-to-speech
	- voice-cloning
	- gradio
	- fastapi
	---

	# Tortoise TTS with Voice Cloning

	A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.

	## Description

	This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:
	- Uploading your own voice sample for cloning
	- Recording your voice directly in the browser
	- Selecting from a variety of preset voices

	The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.

	## How to Use

	### Web Interface

	1. Enter the text you want to convert to speech
	2. Choose one of the following voice options:
	- Upload a voice sample audio file (WAV format recommended)
	- Record your voice using your microphone
	- Select a preset voice from the dropdown menu
	3. Click "Generate Speech"
	4. Listen to or download the generated audio

	### API Endpoints

	The app also provides REST API endpoints for programmatic access:

	1. Voice File TTS - `/api/tts_with_voice_file/`
	- POST request with:
	- `text`: Text to convert to speech (required)
	- `voice_file`: Audio file for voice cloning (optional)
	- `preset_voice`: Name of preset voice (optional, defaults to "random")

	2. Preset Voice TTS - `/api/tts_with_preset/`
	- POST request with:
	- `text`: Text to convert to speech (required)
	- `preset_voice`: Name of preset voice (required)

	### Python Example

	```python
	import requests

	# Using preset voice
	response = requests.post(
	"https://your-space-name.hf.space/api/tts_with_preset/",
	data={"text": "Hello, this is a test.", "preset_voice": "tom"}
	)

	# Save the audio file
	with open("output.wav", "wb") as f:
	f.write(response.content)
	```

	## Technical Details

	This app leverages:
	- Tortoise-TTS: State-of-the-art text-to-speech model
	- Gradio: For the intuitive user interface
	- FastAPI: For the API endpoints
	- Zero-GPU: For efficient GPU utilization on Hugging Face Spaces

	## Limitations

	- Text generation may take some time (30-60 seconds) depending on text length
	- Voice cloning quality depends on the clarity and length of the provided sample
	- For best results, provide voice samples with clear speech and minimal background noise

	## Credits

	This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:

	```
	@misc{tortoise-tts,
	author = {James Betker},
	title = {Tortoise-TTS: A Multi-Voice TTS System},
	year = {2022},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
	}
	```

	## License

	This project is available under the Apache-2.0 License.