Submitted by di-zhang-fdu 55 LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning · 12 authors 4
Submitted by hadasor 49 LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations · 7 authors 5
Submitted by akhaliq 30 VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide · 4 authors 3
Submitted by mfarajtabar 22 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models · 6 authors 6
Submitted by ysu-nlp 21 ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery · 20 authors 2
Submitted by ysu-nlp 19 Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents · 8 authors 2
Submitted by Junyi42 19 MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion · 8 authors 3
Submitted by Njb 18 Presto! Distilling Steps and Layers for Accelerating Music Generation · 6 authors 4
Submitted by deqing 17 TLDR: Token-Level Detective Reward Model for Large Vision Language Models · 8 authors 2
Submitted by demolei 13 MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs · 9 authors 3
Submitted by Duguce 11 TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles · 8 authors 2
Submitted by lilelife 9 OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction · 9 authors 2
Submitted by penfever 7 SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification · 6 authors 2
Submitted by thuhsy 7 Autonomous Character-Scene Interaction Synthesis from Text Instruction · 7 authors 2
Submitted by DwanZhang 5 SePPO: Semi-Policy Preference Optimization for Diffusion Alignment · 11 authors 2
Submitted by RaphaelLiu 5 Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach · 8 authors 2
Submitted by ZinengTang 4 Grounding Language in Multi-Perspective Referential Communication · 3 authors 2
Submitted by zheweiyao 2 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation · 4 authors 2