Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
MinakamiYuki 's Collections
LLM paper

LLM paper

updated Dec 29, 2024
Upvote
-

  • Training Language Models to Self-Correct via Reinforcement Learning

    Paper • 2409.12917 • Published Sep 19, 2024 • 141

  • Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

    Paper • 2409.18943 • Published Sep 27, 2024 • 30

  • From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

    Paper • 2411.16594 • Published Nov 25, 2024 • 42

  • Offline Reinforcement Learning for LLM Multi-Step Reasoning

    Paper • 2412.16145 • Published Dec 20, 2024 • 39

  • Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

    Paper • 2412.18319 • Published Dec 24, 2024 • 40
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs