-
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 62 -
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper • 2311.00945 • Published • 16 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 12 -
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2402.08093
-
metavoiceio/metavoice-1B-v0.1
Text-to-Speech • Updated • 351 • 786 -
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 62 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 196 -
SWivid/F5-TTS
Text-to-Speech • Updated • 826k • 1.09k
-
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Paper • 2210.06774 • Published • 2 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 2 -
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • Published -
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Paper • 2305.19118 • Published
-
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Paper • 2211.06687 • Published • 3 -
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Paper • 2401.17690 • Published • 5 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 17
-
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 62 -
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper • 2311.00945 • Published • 16 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 12 -
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3
-
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Paper • 2210.06774 • Published • 2 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 2 -
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • Published -
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Paper • 2305.19118 • Published
-
metavoiceio/metavoice-1B-v0.1
Text-to-Speech • Updated • 351 • 786 -
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 62 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 196 -
SWivid/F5-TTS
Text-to-Speech • Updated • 826k • 1.09k
-
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Paper • 2211.06687 • Published • 3 -
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Paper • 2401.17690 • Published • 5 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 17