--- title: Voice Conversion emoji: 🎤 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 5.12.0 app_file: app.py pinned: false --- # Amphion's Vevo - Voice Conversion & TTS This is a Gradio web interface for the Vevo voice conversion model from the Amphion toolkit. It supports: - Voice conversion (transferring both style and timbre) - Timbre-only conversion - Text-to-Speech with voice cloning ## Usage 1. Select the mode you want to use (voice, timbre, or TTS) 2. Upload the required audio files: - Source audio (for voice and timbre modes) - Reference style audio (for voice and TTS modes) - Reference timbre audio (for all modes) 3. For TTS mode: - Enter the text you want to convert to speech - Optionally provide reference text and select languages 4. Adjust the Flow Matching Steps if needed (default: 32) 5. Click "Generate" to create the converted audio ## Models The application uses the following models from Hugging Face: - Content Tokenizer (vq32) - Content-Style Tokenizer (vq8192) - Autoregressive Transformer - Flow Matching Transformer - Vocoder ## Technical Requirements - Python 3.8+ - CUDA-capable GPU recommended for faster inference