Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.07916

about 19 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

MiniMaxAI/TTS-Multilingual-Test-Set

Viewer • Updated 14 days ago • 48 • 950 • 15
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

Paper • 2505.07916 • Published 25 days ago • 124
Running

90

90

MiniMax Speech Tech Report

🎙

Generate high-quality speech from text with MiniMax-Speech

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published 17 days ago • 72
Reward Reasoning Model

Paper • 2505.14674 • Published 17 days ago • 34
Qwen3 Technical Report

Paper • 2505.09388 • Published 23 days ago • 182
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published 18 days ago • 78

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

Paper • 2505.07916 • Published 25 days ago • 124

instruction-following

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

Paper • 2505.07916 • Published 25 days ago • 124
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models

Paper • 2505.07591 • Published 25 days ago • 10

about 17 hours ago

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 10
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 84

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 156
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 285
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Paper • 2502.04644 • Published Feb 7 • 2
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Paper • 2504.07956 • Published Apr 10 • 46

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper • 2412.15322 • Published Dec 19, 2024 • 18
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 82
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5 • 22
Fast Text-to-Audio Generation with Adversarial Post-Training

Paper • 2505.08175 • Published 25 days ago • 22

about 18 hours ago

fishaudio/fish-speech-1.4

Text-to-Speech • Updated Nov 5, 2024 • 463 • 451
stepfun-ai/GOT-OCR2_0

Image-Text-to-Text • Updated Feb 4 • 150k • 1.48k
OuteAI/OuteTTS-0.1-350M

Text-to-Speech • Updated Apr 17 • 576 • 301
hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10 • 1.85M • • 4.47k

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs