MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization Paper • 2503.16874 • Published Mar 21 • 44
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning Paper • 2505.23380 • Published 8 days ago • 23
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning Paper • 2505.23754 • Published 8 days ago • 15
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence Paper • 2505.20325 • Published 14 days ago • 44