Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute Paper • 2503.23803 • Published Mar 31 • 8
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation Paper • 2503.19622 • Published Mar 25 • 31