GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper β’ 2503.03751 β’ Published Mar 5 β’ 22 β’ 4
DynVFX: Augmenting Real Videos with Dynamic Content Paper β’ 2502.03621 β’ Published Feb 5 β’ 30 β’ 3
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing Paper β’ 2402.15151 β’ Published Feb 23, 2024 β’ 7 β’ 2
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages Paper β’ 2303.12582 β’ Published Mar 22, 2023 β’ 20 β’ 3
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing Paper β’ 2306.14435 β’ Published Jun 26, 2023 β’ 20 β’ 5
Agile Catching with Whole-Body MPC and Blackbox Policy Learning Paper β’ 2306.08205 β’ Published Jun 14, 2023 β’ 8 β’ 1
RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation Paper β’ 2306.11706 β’ Published Jun 20, 2023 β’ 7 β’ 1