PEFT
Safetensors
Zephyr-7B-MixAT / README.md
dimitadi's picture
Update README.md
c8115fc verified
---
base_model: HuggingFaceH4/zephyr-7b-beta
library_name: peft
license: apache-2.0
---
# INSAIT-Institute/Zephyr-7B-MixAT
![INSAIT logo](./assets/images/insait.png)
This is a model adapter for [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), fine-tuned using the MixAT method. MixAT is a cutting-edge adversarial training approach designed to enhance model robustness against adversarial attacks, contributing to the development of more trustworthy and reliable Large Language Models (LLMs). For details, see our paper [MixAT: Combining Continuous and Discrete Adversarial Training for LLMs](https://arxiv.org/abs/2505.16947). Training and evaluation code is available in the [MixAT Github repository](https://github.com/insait-institute/MixAT).
## Use in 🤗 PEFT and Transformers (Quantized)
First, install the required libraries:
```bash
pip install transformers peft bitsandbytes
```
Then, load the base model (4bit quantized) using transformers and apply the adapter using peft:
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="bfloat16"
)
base_model = AutoModelForCausalLM.from_pretrained(
"HuggingFaceH4/zephyr-7b-beta",
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=bnb_config
)
model = PeftModel.from_pretrained(base_model, "INSAIT-Institute/Zephyr-7B-MixAT")
```
## Results
MixAT has been evaluated against a broad range of state-of-the-art adversarial attacks, introducing the At Least One Attack Success Rate (ALO-ASR) metric to assess worst-case model vulnerability. Our results show that MixAT achieves significantly improved robustness (ALO-ASR < 20%) compared to prior defenses (ALO-ASR > 50%), while maintaining good utility scores and a runtime comparable to continuous relaxation-based methods.
![MixAT results](./assets/images/main_table.png)
## Model Sources
- Repository: https://github.com/insait-institute/MixAT
- Paper: https://arxiv.org/abs/2505.16947
## Summary
- Base model: [HuggingFaceH4/zephyr-7b-beta](HuggingFaceH4/zephyr-7b-beta)
- Contact: dimitar.iliev.dimitrov@insait.ai and dekanycsaba23@gmail.com
- License: Distributed under [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)
## Citation
```bibtex
@article{dekany2025mixat,
title={MixAT: Combining Continuous and Discrete Adversarial Training for LLMs},
author={D{\'e}k{\'a}ny, Csaba and Balauca, Stefan and Staab, Robin and Dimitrov, Dimitar I and Vechev, Martin},
journal={arXiv preprint arXiv:2505.16947},
year={2025}
}
```