UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning Paper • 2505.23380 • Published 8 days ago • 23
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published 10 days ago • 91
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published 14 days ago • 63
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning Paper • 2505.17540 • Published 14 days ago • 7
Flow-GRPO: Training Flow Matching Models via Online RL Paper • 2505.05470 • Published 29 days ago • 78
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation Paper • 2504.14538 • Published Apr 20 • 29
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17 • 50
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper • 2504.02436 • Published Apr 3 • 37
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25 • 73
XAttention: Block Sparse Attention with Antidiagonal Scoring Paper • 2503.16428 • Published Mar 20 • 14
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance Paper • 2503.16421 • Published Mar 20 • 10
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 41
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published Mar 14 • 141
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Paper • 2503.10391 • Published Mar 13 • 11