Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
bfuzzy1 's Collections
acheron-m
RL
llambses-1
acheron
Gunny
AI for Good
Agents
Agentic-ly agentic
Attentive
Don't hate - evaluate
Generation Nation
Nifty

RL

updated Jan 31
Upvote
1

  • RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

    Paper • 2412.14922 • Published Dec 19, 2024 • 89

  • B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

    Paper • 2412.17256 • Published Dec 23, 2024 • 48

  • Deliberation in Latent Space via Differentiable Cache Augmentation

    Paper • 2412.17747 • Published Dec 23, 2024 • 33

  • Outcome-Refining Process Supervision for Code Generation

    Paper • 2412.15118 • Published Dec 19, 2024 • 19

  • REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

    Paper • 2501.03262 • Published Jan 4 • 99

  • Evolving Deeper LLM Thinking

    Paper • 2501.09891 • Published Jan 17 • 115

  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Paper • 2501.12948 • Published Jan 22 • 400

  • Kimi k1.5: Scaling Reinforcement Learning with LLMs

    Paper • 2501.12599 • Published Jan 22 • 118

  • Towards General-Purpose Model-Free Reinforcement Learning

    Paper • 2501.16142 • Published Jan 27 • 30

  • Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

    Paper • 2501.17703 • Published Jan 29 • 59
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs