-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2506.01939
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 112 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 101
-
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Paper • 2505.21115 • Published • 139 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 177 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 135
-
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 80 -
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Paper • 2506.06941 • Published • 14 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 177
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 177 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 135 -
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Paper • 2505.22617 • Published • 129
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 255 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 129 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 68 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 269
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 269 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 177 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 135 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 80 -
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Paper • 2506.06941 • Published • 14 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 177
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 112 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 101
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 177 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 135 -
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Paper • 2505.22617 • Published • 129
-
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Paper • 2505.21115 • Published • 139 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 177 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 135
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 255 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 129 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 68 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 269
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 269 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 177 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 135 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97