view changelog Changelog Organization and User profiles now include repository listing pages Jun 20 • 124
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13 • 69
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 129
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published Jun 12 • 74
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11 • 98
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 261
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16 • 76
Distilling LLM Agent into Small Models with Retrieval and Code Tools Paper • 2505.17612 • Published May 23 • 81
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 89
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26 • 88