PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models Paper • 2503.02324 • Published Mar 4
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study Paper • 2504.00829 • Published Apr 1
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning Paper • 2504.02546 • Published Apr 3 • 1
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning Paper • 2505.14140 • Published 17 days ago • 1
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM Paper • 2504.14286 • Published Apr 19
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published 17 days ago • 22