---
title: Voice Conversion
emoji: 🎤
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
---

# Amphion's Vevo - Voice Conversion & TTS

This is a Gradio web interface for the Vevo voice conversion model from the Amphion toolkit. It supports:

- Voice conversion (transferring both style and timbre)
- Timbre-only conversion
- Text-to-Speech with voice cloning

## Usage

1. Select the mode you want to use (voice, timbre, or TTS)
2. Upload the required audio files:
   - Source audio (for voice and timbre modes)
   - Reference style audio (for voice and TTS modes)
   - Reference timbre audio (for all modes)
3. For TTS mode:
   - Enter the text you want to convert to speech
   - Optionally provide reference text and select languages
4. Adjust the Flow Matching Steps if needed (default: 32)
5. Click "Generate" to create the converted audio

## Models

The application uses the following models from Hugging Face:
- Content Tokenizer (vq32)
- Content-Style Tokenizer (vq8192)
- Autoregressive Transformer
- Flow Matching Transformer
- Vocoder

## Technical Requirements

- Python 3.8+
- CUDA-capable GPU recommended for faster inference