ATI: Any Trajectory Instruction for Controllable Video Generation Paper • 2505.22944 • Published 9 days ago • 7
MAGREF: Masked Guidance for Any-Reference Video Generation Paper • 2505.23742 • Published 8 days ago • 9
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization Paper • 2505.24862 • Published 7 days ago • 30
Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment Paper • 2505.18600 • Published 13 days ago • 45
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance Paper • 2505.21876 • Published 10 days ago • 9
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Paper • 2505.16175 • Published 16 days ago • 39
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation Paper • 2410.01469 • Published Oct 2, 2024 • 2
AudioX: Diffusion Transformer for Anything-to-Audio Generation Paper • 2503.10522 • Published Mar 13 • 26
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published Mar 6 • 24
Vid2World: Crafting Video Diffusion Models to Interactive World Models Paper • 2505.14357 • Published 17 days ago • 25
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation Paper • 2505.10238 • Published 22 days ago • 8
Fast Text-to-Audio Generation with Adversarial Post-Training Paper • 2505.08175 • Published 25 days ago • 22
LightLab: Controlling Light Sources in Images with Diffusion Models Paper • 2505.09608 • Published 23 days ago • 31
Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis Paper • 2505.00135 • Published Apr 30 • 2
view article Article Blazingly fast whisper transcriptions with Inference Endpoints By mfuntowicz and 5 others • 25 days ago • 67