view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 26 days ago • 418
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published May 1 • 26
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published Apr 30 • 56
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 74
Iterative Self-Training for Code Generation via Reinforced Re-Ranking Paper • 2504.09643 • Published Apr 13 • 34
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published Apr 9 • 11
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10 • 28
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 85
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Paper • 2504.01956 • Published Apr 2 • 40
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published Apr 3 • 57
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 285