REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback Paper • 2505.06548 • Published 28 days ago • 30 • 6
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback Paper • 2505.06548 • Published 28 days ago • 30 • 6
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 94 • 15
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 94 • 15
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 127 • 21
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 127 • 21
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 127 • 21
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 127 • 21
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 127 • 21
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15 • 13 • 6
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15 • 13 • 6
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models Paper • 2504.06214 • Published Apr 8 • 1
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Paper • 2503.20641 • Published Mar 26 • 8 • 5
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Paper • 2503.20641 • Published Mar 26 • 8 • 5