Collections
Discover the best community collections!
Collections including paper arxiv:2502.20396
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 47 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 38 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 48
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 18 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16
-
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Paper • 2502.19328 • Published • 22 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75 -
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
Paper • 2502.20396 • Published • 16 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 82
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 49 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 118
-
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Paper • 2502.19328 • Published • 22 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75 -
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
Paper • 2502.20396 • Published • 16 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 82
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 47 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 38 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 48
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 49 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 118
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 18 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16