ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Paper • 2503.21144 • Published Mar 27 • 27
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper • 2507.11097 • Published Jul 15 • 63
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27 • 108
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published May 20 • 133
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 145
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation Paper • 2504.14899 • Published Apr 21 • 21
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives Paper • 2504.10823 • Published Apr 15 • 16
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published Mar 30 • 95
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 73
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published Feb 27 • 28
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper • 2502.20126 • Published Feb 27 • 20
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published Feb 27 • 30
view article Article Mixture of Experts Explained By osanseviero and 5 others • Dec 11, 2023 • 833