Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.01939

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 4 days ago • 127
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Paper • 2505.23183 • Published 9 days ago • 2
Improved Representation Steering for Language Models

Paper • 2505.20809 • Published 11 days ago • 1
SAEs Are Good for Steering -- If You Select the Right Features

Paper • 2505.20063 • Published 11 days ago • 1

Research Papers/Reviews/Literature

Daily Research papers and review including older relevant content.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18 • 149
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published Mar 19 • 47
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 49

RL+reason model

about 20 hours ago

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 122
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

about 10 hours ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 4 days ago • 127

about 12 hours ago

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published 7 days ago • 165
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 4 days ago • 127
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published 7 days ago • 112
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published 7 days ago • 87

inference-optimization

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 4 days ago • 127

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 4 days ago • 127

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 4 days ago • 127

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 4 days ago • 127
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published 4 days ago • 74
Taming LLMs by Scaling Learning Rates with Gradient Grouping

Paper • 2506.01049 • Published 5 days ago • 35
ARIA: Training Language Agents with Intention-Driven Reward Aggregation

Paper • 2506.00539 • Published 6 days ago • 26

s3: You Don't Need That Much Data to Train a Search Agent via RL

Paper • 2505.14146 • Published 18 days ago • 17
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI

Paper • 2505.19443 • Published 12 days ago • 15
ARM: Adaptive Reasoning Model

Paper • 2505.20258 • Published 11 days ago • 43
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Paper • 2505.19914 • Published 11 days ago • 40

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs