Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

OpenAssistant
/
reward-model-deberta-v3-large-v2

Text Classification
Transformers
PyTorch
English
deberta-v2
reward-model
reward_model
RLHF
Model card Files Files and versions Community
11
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

what score is high quality

#11 opened 6 months ago by
aj666

Hyperparameters training setting

#10 opened over 1 year ago by
hyuk199

synthetic-instruct-gptj-pairwise pairwise data how to pre-process for train data

2
#9 opened over 1 year ago by
chaochaoli

How to fine tune this model with the Trainer API?

๐Ÿ‘ 1
1
#8 opened over 1 year ago by
duzm

How to score a <instruction, input, output> pair?

#7 opened over 1 year ago by
qldu

Validation split indices?

๐Ÿ‘ 2
1
#6 opened almost 2 years ago by
cmglaze

np.int deprecation issue

โค๏ธ 1
#5 opened almost 2 years ago by
whiteg671

Question about evaluating this reward model on Anthropic/hh-rlhf

1
#4 opened about 2 years ago by
songff

Adding `safetensors` variant of this model

#3 opened about 2 years ago by
SFconvertbot

How to optimize loss function?

1
#1 opened over 2 years ago by
nidong
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs