Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.08093

a collection of text to speech papers.

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62
E3 TTS: Easy End-to-End Diffusion-based Text to Speech

Paper • 2311.00945 • Published Nov 2, 2023 • 16
Matcha-TTS: A fast TTS architecture with conditional flow matching

Paper • 2309.03199 • Published Sep 6, 2023 • 12
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Paper • 1712.05884 • Published Dec 16, 2017 • 3

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62
rain1011/pyramid-flow-sd3

Text-to-Video • Updated Oct 30, 2024 • 831

metavoiceio/metavoice-1B-v0.1

Text-to-Speech • Updated Apr 3, 2024 • 351 • 786
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 196
SWivid/F5-TTS

Text-to-Speech • Updated Mar 21 • 826k • 1.09k

生成式AI導論 2024

https://www.youtube.com/@HungyiLeeNTU

Re3: Generating Longer Stories With Recursive Reprompting and Revision

Paper • 2210.06774 • Published Oct 13, 2022 • 2
Constitutional AI: Harmlessness from AI Feedback

Paper • 2212.08073 • Published Dec 15, 2022 • 2
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

Paper • 2402.04253 • Published Feb 6, 2024
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Paper • 2305.19118 • Published May 30, 2023

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

there's many more on arxiv if you search for CLAP

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Paper • 2211.06687 • Published Nov 12, 2022 • 3
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Paper • 2401.17690 • Published Jan 31, 2024 • 5
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
Audiobox: Unified Audio Generation with Natural Language Prompts

Paper • 2312.15821 • Published Dec 25, 2023 • 17

a collection of text to speech papers.

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62
E3 TTS: Easy End-to-End Diffusion-based Text to Speech

Paper • 2311.00945 • Published Nov 2, 2023 • 16
Matcha-TTS: A fast TTS architecture with conditional flow matching

Paper • 2309.03199 • Published Sep 6, 2023 • 12
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Paper • 1712.05884 • Published Dec 16, 2017 • 3

生成式AI導論 2024

https://www.youtube.com/@HungyiLeeNTU

Re3: Generating Longer Stories With Recursive Reprompting and Revision

Paper • 2210.06774 • Published Oct 13, 2022 • 2
Constitutional AI: Harmlessness from AI Feedback

Paper • 2212.08073 • Published Dec 15, 2022 • 2
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

Paper • 2402.04253 • Published Feb 6, 2024
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Paper • 2305.19118 • Published May 30, 2023

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62
rain1011/pyramid-flow-sd3

Text-to-Video • Updated Oct 30, 2024 • 831

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62

metavoiceio/metavoice-1B-v0.1

Text-to-Speech • Updated Apr 3, 2024 • 351 • 786
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 62
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 196
SWivid/F5-TTS

Text-to-Speech • Updated Mar 21 • 826k • 1.09k

there's many more on arxiv if you search for CLAP

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Paper • 2211.06687 • Published Nov 12, 2022 • 3
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Paper • 2401.17690 • Published Jan 31, 2024 • 5
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
Audiobox: Unified Audio Generation with Natural Language Prompts

Paper • 2312.15821 • Published Dec 25, 2023 • 17

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs