Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published 4 days ago • 127
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published 7 days ago • 112
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning Paper • 2505.16400 • Published 15 days ago • 30
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Paper • 2505.15612 • Published 16 days ago • 32
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published 17 days ago • 22
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published 17 days ago • 22
Group-in-Group Policy Optimization for LLM Agent Training Paper • 2505.10978 • Published 21 days ago • 3