Submitted by Liuff23 68 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence · 4 authors 318 3
Submitted by AngLv 67 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason · 5 authors 104 2
Submitted by songtingyu 56 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos · 4 authors 3 2
Submitted by cyyang822 46 ZeroGUI: Automating Online GUI Learning at Zero Human Cost · 14 authors 84 2
Submitted by shizhediao 42 Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding · 9 authors 2
Submitted by lyx97 40 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? · 10 authors 29 6
Submitted by maksimko123 35 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning · 9 authors 3
Submitted by lhjiang 31 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views · 12 authors 2
Submitted by dlaptev 25 Train Sparse Autoencoders Efficiently by Utilizing Features Correlation · 5 authors 2
Submitted by chaoscodes 24 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering · 11 authors 2
Submitted by ydalva 23 LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers · 3 authors 3
Submitted by AliBehrouz 23 ATLAS: Learning to Optimally Memorize the Context at Test Time · 8 authors 2
Submitted by benzweijia 23 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning · 3 authors 2
Submitted by sy1998 20 VidText: Towards Comprehensive Evaluation for Video Text Understanding · 10 authors 2
Submitted by spapi 20 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian · 9 authors 2
Submitted by TharinduSK 18 Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation · 9 authors 2
Submitted by Jiahao004 16 DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning · 13 authors 22 2
Submitted by BryanW 15 Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model · 11 authors 79 3
Submitted by Bang-UdeM-Mila 13 System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts · 4 authors 2
Submitted by KunlunZhu 12 SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents · 9 authors 2
Submitted by Jang-Hyun 12 KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction · 6 authors 96 2
Submitted by antonio-c 12 GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control · 8 authors 50 3
Submitted by wangsssssss 12 Differentiable Solver Search for Fast Diffusion Sampling · 8 authors 21 2
Submitted by dek924 11 PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions · 8 authors 2
Submitted by jefflai 10 Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding? · 7 authors 2
Submitted by BestWishYsh 9 MAGREF: Masked Guidance for Any-Reference Video Generation · 11 authors 251 2
Submitted by Elfsong 9 Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization · 9 authors 5 2
Submitted by m-serious 8 ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind · 3 authors 14 2
Submitted by smallAI 8 Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction · 6 authors 2
Submitted by ttumyche 8 CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays · 6 authors 14 2
Submitted by angtian 7 ATI: Any Trajectory Instruction for Controllable Video Generation · 5 authors 2
Submitted by crc5577 7 Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape · 5 authors 2
Submitted by davidchan 6 Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint · 6 authors 11 2
Submitted by JRQi 6 When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy · 6 authors 2 2
Submitted by hdong51 5 To Trust Or Not To Trust Your Vision-Language Model's Prediction · 5 authors 13 2
Submitted by lyxun 5 UniTEX: Universal High Fidelity Generative Texturing for 3D Shapes · 8 authors 132 2
Submitted by kornelhowil 5 CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting · 6 authors 21 2
Submitted by JingzeShi 5 Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting · 7 authors 2
Submitted by lhmd 4 ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS · 6 authors 111 5
Submitted by StringChaos 4 GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents · 6 authors 2
Submitted by ahnpersie 4 Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates · 4 authors 5 4
Submitted by kpzhang996 4 SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model · 7 authors 2
Submitted by SuperSupermoon 4 Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation · 13 authors 2
Submitted by Franck-Dernoncourt 4 A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models · 9 authors 2
Submitted by yunjae-won 3 Differential Information: An Information-Theoretic Perspective on Preference Optimization · 4 authors 2
Submitted by Aman 3 Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator · 6 authors 2
Submitted by Junfeng5 3 TokBench: Evaluating Your Visual Tokenizer before Visual Generation · 9 authors 121 2
Submitted by TeddyXGZ 3 Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models · 8 authors 4 2
Submitted by gsarti 2 Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement · 4 authors 1 2
Submitted by pengxiang 2 Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking · 7 authors 2
Submitted by ctma 2 Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities · 5 authors 65 2