Submitted by JC-Chen 40 Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning · 10 authors 4
Submitted by ahmedheakl 35 CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark · 6 authors 5
Submitted by tsq2000 25 MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos · 9 authors 1
Submitted by bys0318 23 SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models · 5 authors 1
Submitted by tsq2000 22 Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis · 6 authors 1
Submitted by tyhuang 18 Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation · 11 authors 1
Submitted by yuanshengni 18 VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation · 5 authors 2
Submitted by Yuanze 18 IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation · 5 authors 2
Submitted by ubowang 16 Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem · 5 authors 2
Submitted by myhong 15 Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models · 4 authors 1
Submitted by tricktreat 13 SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation · 13 authors 2
Submitted by Dazitu616 12 DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models · 8 authors 1
Submitted by Guiyang1001 11 TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence · 11 authors 1
Submitted by dongminpark 7 Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games · 16 authors 1
Submitted by weiminwang 7 TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models · 2 authors 1
Submitted by KaituoFeng 6 Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback · 7 authors 1
Submitted by EunsuKim 6 BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation · 6 authors 1
Submitted by yiren98 6 DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers · 6 authors 1
Submitted by westbrook 5 CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech · 13 authors 2
Submitted by mpatel57 4 RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions · 5 authors 2
Submitted by NPBP26 4 Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation · 4 authors 1
Submitted by RanjanSapkota 3 TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems · 4 authors 1
Submitted by xiao-qi 3 HTSC-2025: A Benchmark Dataset of Ambient-Pressure High-Temperature Superconductors for AI-Driven Critical Temperature Prediction · 6 authors 2
Submitted by ChengsongHuang 3 POSS: Position Specialist Generates Better Draft for Speculative Decoding · 5 authors 1
Submitted by j-min 3 Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning · 4 authors 1
Submitted by Franck-Dernoncourt 3 Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents · 7 authors 1
Submitted by brucelyu 2 Unleashing Hour-Scale Video Training for Long Video-Language Understanding · 11 authors 1
Submitted by Mountchicken 2 Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning · 5 authors 1
Submitted by JacobYuan 2 Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective · 4 authors 1
Submitted by FabianKarl 2 CRAWLDoc: A Dataset for Robust Ranking of Bibliographic Documents · 2 authors 1
Submitted by Zhuohan 2 FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning · 17 authors 1
Submitted by gyr66 2 Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models · 5 authors 1
Submitted by xiaobinzhuang 1 Sounding that Object: Interactive Object-Aware Image to Audio Generation · 9 authors 1
Submitted by jgonsior 1 Survey of Active Learning Hyperparameters: Insights from a Large-Scale Experimental Grid · 6 authors 1
Submitted by xjcvcvxj 1 Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting · 5 authors 1
Submitted by JY-Young 1 RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents · 4 authors 1