|
--- |
|
base_model: HuggingFaceH4/zephyr-7b-beta |
|
library_name: peft |
|
license: apache-2.0 |
|
--- |
|
|
|
# INSAIT-Institute/Zephyr-7B-MixAT |
|
|
|
 |
|
|
|
This is a model adapter for [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), fine-tuned using the MixAT method. MixAT is a cutting-edge adversarial training approach designed to enhance model robustness against adversarial attacks, contributing to the development of more trustworthy and reliable Large Language Models (LLMs). For details, see our paper [MixAT: Combining Continuous and Discrete Adversarial Training for LLMs](https://arxiv.org/abs/2505.16947). Training and evaluation code is available in the [MixAT Github repository](https://github.com/insait-institute/MixAT). |
|
|
|
|
|
## Use in 🤗 PEFT and Transformers (Quantized) |
|
First, install the required libraries: |
|
|
|
```bash |
|
pip install transformers peft bitsandbytes |
|
``` |
|
|
|
Then, load the base model (4bit quantized) using transformers and apply the adapter using peft: |
|
|
|
```python |
|
from peft import PeftModel |
|
from transformers import AutoModelForCausalLM, BitsAndBytesConfig |
|
import torch |
|
|
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_use_double_quant=False, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype="bfloat16" |
|
) |
|
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"HuggingFaceH4/zephyr-7b-beta", |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
quantization_config=bnb_config |
|
) |
|
|
|
model = PeftModel.from_pretrained(base_model, "INSAIT-Institute/Zephyr-7B-MixAT") |
|
``` |
|
|
|
## Results |
|
MixAT has been evaluated against a broad range of state-of-the-art adversarial attacks, introducing the At Least One Attack Success Rate (ALO-ASR) metric to assess worst-case model vulnerability. Our results show that MixAT achieves significantly improved robustness (ALO-ASR < 20%) compared to prior defenses (ALO-ASR > 50%), while maintaining good utility scores and a runtime comparable to continuous relaxation-based methods. |
|
|
|
 |
|
|
|
|
|
## Model Sources |
|
|
|
- Repository: https://github.com/insait-institute/MixAT |
|
- Paper: https://arxiv.org/abs/2505.16947 |
|
|
|
|
|
## Summary |
|
|
|
- Base model: [HuggingFaceH4/zephyr-7b-beta](HuggingFaceH4/zephyr-7b-beta) |
|
- Contact: dimitar.iliev.dimitrov@insait.ai and dekanycsaba23@gmail.com |
|
- License: Distributed under [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) |
|
|
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{dekany2025mixat, |
|
title={MixAT: Combining Continuous and Discrete Adversarial Training for LLMs}, |
|
author={D{\'e}k{\'a}ny, Csaba and Balauca, Stefan and Staab, Robin and Dimitrov, Dimitar I and Vechev, Martin}, |
|
journal={arXiv preprint arXiv:2505.16947}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
|