InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 280
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27 • 108
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 184
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction Paper • 2503.15661 • Published Mar 19 • 2
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25 • 73
Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models Paper • 2503.17811 • Published Mar 22 • 13
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.28k
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published Jan 18 • 26
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper • 2503.01774 • Published Mar 3 • 45
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published Feb 23 • 27
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 193
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework Paper • 2502.13759 • Published Feb 19 • 4
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper • 2502.14397 • Published Feb 20 • 42
LOVA3: Learning to Visual Question Answering, Asking and Assessment Paper • 2405.14974 • Published May 23, 2024 • 1
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback Paper • 2502.15027 • Published Feb 20 • 7
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 122
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation Paper • 2502.08047 • Published Feb 12 • 28