|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
<div style="text-align:center;"> |
|
<strong>Safety classifier for Detoxifying Large Language Models via Knowledge Editing</strong> |
|
</div> |
|
|
|
# π» Usage |
|
|
|
```shell |
|
from transformers import RobertaForSequenceClassification, RobertaTokenizer |
|
safety_classifier_dir = 'zjunlp/SafeEdit-Safety-Classifier' |
|
safety_classifier_model = RobertaForSequenceClassification.from_pretrained(safety_classifier_dir) |
|
safety_classifier_tokenizer = RobertaTokenizer.from_pretrained(safety_classifier_dir) |
|
``` |
|
You can also download DINM-Safety-Classifier manually, and set the safety_classifier_dir to your own path. |
|
|
|
|
|
# π Citation |
|
|
|
If you use our work, please cite our paper: |
|
|
|
```bibtex |
|
@misc{wang2024SafeEdit, |
|
title={Detoxifying Large Language Models via Knowledge Editing}, |
|
author={Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen}, |
|
year={2024}, |
|
eprint={2403.14472}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
url={https://arxiv.org/abs/2403.14472}, |
|
|
|
} |
|
``` |
|
|