Submitted by shizhediao 111 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models · 8 authors 3
Submitted by RunpeiDong 82 AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time · 11 authors 2
Submitted by mukul54 71 Time Blindness: Why Video-Language Models Can't See What Humans Can? · 4 authors 3
Submitted by kjunh 35 Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation · 6 authors 2
Submitted by wchengad 29 ViStoryBench: Comprehensive Benchmark Suite for Story Visualization · 15 authors 2
Submitted by vztu 23 DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models · 4 authors 3
Submitted by YaxinLuo 21 Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents · 6 authors 2
Submitted by huaijinpi 19 CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects · 4 authors 2
Submitted by yiqingliang 18 MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning · 10 authors 3
Submitted by johncliu 16 MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs · 6 authors 2
Submitted by ruskinmanku 16 EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge · 5 authors 2
Submitted by LCZZZZ 14 More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models · 8 authors 2
Submitted by huanngzh 13 UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation · 8 authors 2
Submitted by yiren98 11 EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering · 5 authors 2
Submitted by Chae0 10 Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models · 4 authors 2
Submitted by Yif29 9 ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL · 10 authors 2
Submitted by entropyhu 9 Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning · 6 authors 3
Submitted by mengdaxu 9 DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation · 7 authors 2
Submitted by feltoner 7 ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents · 13 authors 2
Submitted by AdinaY 6 Evaluating and Steering Modality Preferences in Multimodal Large Language Model · 8 authors 2
Submitted by ZonglinY 5 Harnessing Large Language Models for Scientific Novelty Detection · 5 authors 2
Submitted by yinqi 5 un^2CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP · 6 authors 2
Submitted by patricebechard 5 Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows · 5 authors 2
Submitted by Xuweiyi 5 Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts · 4 authors 2
Submitted by TonyK 5 Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation · 13 authors 2
Submitted by mamaj92 4 Revisiting Bi-Linear State Transitions in Recurrent Neural Networks · 2 authors 2
Submitted by lizhuang144 3 TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis · 8 authors 2
Submitted by Omartificial-Intelligence-Space 3 GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training · 6 authors 2
Submitted by Debargha 3 Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks · 10 authors 2
Submitted by Chouoftears 2 The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets · 6 authors 3
Submitted by vsahil 2 OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities · 7 authors 2
Submitted by Chaeeun-Kim 2 LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation · 3 authors 1
Submitted by manu 1 Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings · 6 authors 2
Submitted by yongzx 1 The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It · 5 authors 2