Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 24
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Paper • 2506.18945 • Published Jun 23 • 39
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Paper • 2505.22662 • Published May 28 • 6
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Paper • 2505.24846 • Published May 30 • 15
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 135
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 97
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published May 5 • 25
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17 • 93
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning Paper • 2503.18769 • Published Mar 24 • 11
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 57
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published Feb 27 • 47
Rethinking Diverse Human Preference Learning through Principal Component Analysis Paper • 2502.13131 • Published Feb 18 • 38
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Paper • 2502.09560 • Published Feb 13 • 36
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published Feb 5 • 24