Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Legion-96 's Collections
Fine Tuning

Fine Tuning

updated 4 days ago
Upvote
-

  • Fine-Tuning Language Models from Human Preferences

    Paper • 1909.08593 • Published Sep 18, 2019 • 3

  • PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models

    Paper • 2503.02324 • Published Mar 4

  • How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study

    Paper • 2504.00829 • Published Apr 1

  • GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

    Paper • 2504.02546 • Published Apr 3 • 1

  • RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning

    Paper • 2505.14140 • Published 17 days ago • 1

  • SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM

    Paper • 2504.14286 • Published Apr 19

  • General-Reasoner: Advancing LLM Reasoning Across All Domains

    Paper • 2505.14652 • Published 17 days ago • 22
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs