MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation Paper • 2505.17613 • Published 15 days ago • 8
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 94
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees Paper • 2503.08893 • Published Mar 11 • 5