--- datasets: - natolambert/skywork-preferences-80k-v0.1-cleaned - allenai/preference-test-sets --- # MetaMetrics-RM-v1.0 (ICLR 2025) + **Authors** [Genta Indra Winata](https://gentawinata.com/), [David Anugraha](https://davidanugraha.github.io/), [Lucky Susanto](https://luckysusanto.github.io/), [Garry Kuwanto](https://gkuwanto.github.io/), [Derry Tanti Wijaya](https://derrywijaya.github.io/) + **Arxiv Paper**: https://arxiv.org/abs/2410.02381 + **ICLR Paper**: https://openreview.net/forum?id=slO3xTt4CG + **Model**: [meta-metrics/MetaMetrics-RM-v1.0](https://huggingface.co/meta-metrics/MetaMetrics-RM-v1.0) + **Dataset**: - [natolambert/skywork-preferences-80k-v0.1-cleaned](https://huggingface.co/datasets/natolambert/skywork-preferences-80k-v0.1-cleaned) - [allenai/preference-test-sets](https://huggingface.co/datasets/allenai/preference-test-sets) + **Code Repository:** https://github.com/meta-metrics/metametrics ## RewardBench LeaderBoard | Model | Score | Chat | Chat Hard | Safety | Reasoning | |:-------|:------|:-----|:----------|:-------|:----------| | nvidia/Llama-3.1-Nemotron-70B-Reward | **94.1** | 97.5 | 85.7 | **95.1** | 98.1 | | meta-metrics/MetaMetrics-RM-v1.0 | 93.5 | **98.9** | 86.2 | 90.7 | **98.2** | | SF-Foundation/TextEval-Llama3.1-70B | 93.5 | 94.1 | **90.1** | 93.2 | 96.4 | | RLHFlow/ArmoRM-Llama3-8B-v0.1 | 90.4 | 96.9 | 76.8 | 90.5 | 97.3 | ## Citation If you find this work useful for your research, please consider citing: ``` @inproceedings{ winata2025metametrics, title={MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences}, author={Genta Indra Winata and David Anugraha and Lucky Susanto and Garry Kuwanto and Derry Tanti Wijaya}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=slO3xTt4CG} } ```