-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper β’ 2506.01939 β’ Published β’ 127 -
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Paper β’ 2505.23183 β’ Published β’ 2 -
Improved Representation Steering for Language Models
Paper β’ 2505.20809 β’ Published β’ 1 -
SAEs Are Good for Steering -- If You Select the Right Features
Paper β’ 2505.20063 β’ Published β’ 1
Collections
Discover the best community collections!
Collections including paper arxiv:2506.01939
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper β’ 2501.18585 β’ Published β’ 61 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper β’ 2503.14456 β’ Published β’ 149 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper β’ 2503.15265 β’ Published β’ 47 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper β’ 2503.15558 β’ Published β’ 49
-
RL + Transformer = A General-Purpose Problem Solver
Paper β’ 2501.14176 β’ Published β’ 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper β’ 2501.16142 β’ Published β’ 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper β’ 2501.17161 β’ Published β’ 122 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper β’ 2412.12098 β’ Published β’ 5
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper β’ 2505.24726 β’ Published β’ 165 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper β’ 2506.01939 β’ Published β’ 127 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper β’ 2505.24864 β’ Published β’ 112 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper β’ 2505.24863 β’ Published β’ 87
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper β’ 2506.01939 β’ Published β’ 127 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper β’ 2506.01844 β’ Published β’ 74 -
Taming LLMs by Scaling Learning Rates with Gradient Grouping
Paper β’ 2506.01049 β’ Published β’ 35 -
ARIA: Training Language Agents with Intention-Driven Reward Aggregation
Paper β’ 2506.00539 β’ Published β’ 26
-
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper β’ 2505.14146 β’ Published β’ 17 -
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper β’ 2505.19443 β’ Published β’ 15 -
ARM: Adaptive Reasoning Model
Paper β’ 2505.20258 β’ Published β’ 43 -
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Paper β’ 2505.19914 β’ Published β’ 40