A Survey on Vision-Language-Action Models for Autonomous Driving Paper • 2506.24044 • Published Jun 30 • 14
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Paper • 2507.07095 • Published Jul 9 • 54
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Paper • 2506.19851 • Published Jun 24 • 58
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning Paper • 2506.16012 • Published Jun 19 • 22
GMT: General Motion Tracking for Humanoid Whole-Body Control Paper • 2506.14770 • Published Jun 17 • 8
ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies Paper • 2506.14315 • Published Jun 17 • 10
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper • 2506.09827 • Published Jun 11 • 18
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition Paper • 2506.17201 • Published Jun 20 • 55
DreamCube: 3D Panorama Generation via Multi-plane Synchronization Paper • 2506.17206 • Published Jun 20 • 22
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Paper • 2506.01674 • Published Jun 2 • 28