Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving Paper • 2505.04528 • Published about 1 month ago • 11
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models Paper • 2505.03821 • Published May 3 • 24
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Paper • 2505.04512 • Published about 1 month ago • 35
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published May 5 • 82
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 169
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published May 5 • 74
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Paper • 2312.14238 • Published Dec 21, 2023 • 20
Augmenting CLIP with Improved Visio-Linguistic Reasoning Paper • 2307.09233 • Published Jul 18, 2023 • 9
Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining Paper • 2308.03235 • Published Aug 7, 2023 • 2
ZeroSearch: Incentivize the Search Capability of LLMs without Searching Paper • 2505.04588 • Published about 1 month ago • 64
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6 • 92
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Paper • 2505.03005 • Published May 5 • 31
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published May 1 • 42
A Robust Deep Networks based Multi-Object MultiCamera Tracking System for City Scale Traffic Paper • 2505.00534 • Published May 1 • 2
Spatial Speech Translation: Translating Across Space With Binaural Hearables Paper • 2504.18715 • Published Apr 25 • 7
LLMs for Engineering: Teaching Models to Design High Powered Rockets Paper • 2504.19394 • Published Apr 27 • 13
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Paper • 2504.21659 • Published Apr 30 • 12