---
base_model: HuggingFaceH4/zephyr-7b-beta
library_name: peft
license: apache-2.0
---

# INSAIT-Institute/Zephyr-7B-MixAT

![INSAIT logo](./assets/images/insait.png)

This is a model adapter for [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), fine-tuned using the MixAT method. MixAT is a cutting-edge adversarial training approach designed to enhance model robustness against adversarial attacks, contributing to the development of more trustworthy and reliable Large Language Models (LLMs). For details, see our paper [MixAT: Combining Continuous and Discrete Adversarial Training for LLMs](https://arxiv.org/abs/2505.16947). Training and evaluation code is available in the [MixAT Github repository](https://github.com/insait-institute/MixAT).


## Use in 🤗 PEFT and Transformers (Quantized)
First, install the required libraries:

```bash
pip install transformers peft bitsandbytes
```

Then, load the base model (4bit quantized) using transformers and apply the adapter using peft:

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16"
)

base_model = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceH4/zephyr-7b-beta",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=bnb_config
)

model = PeftModel.from_pretrained(base_model, "INSAIT-Institute/Zephyr-7B-MixAT")
```

## Results
MixAT has been evaluated against a broad range of state-of-the-art adversarial attacks, introducing the At Least One Attack Success Rate (ALO-ASR) metric to assess worst-case model vulnerability. Our results show that MixAT achieves significantly improved robustness (ALO-ASR < 20%) compared to prior defenses (ALO-ASR > 50%), while maintaining good utility scores and a runtime comparable to continuous relaxation-based methods.

![MixAT results](./assets/images/main_table.png)


## Model Sources

- Repository: https://github.com/insait-institute/MixAT
- Paper: https://arxiv.org/abs/2505.16947


## Summary

- Base model: [HuggingFaceH4/zephyr-7b-beta](HuggingFaceH4/zephyr-7b-beta)
- Contact:  dimitar.iliev.dimitrov@insait.ai and dekanycsaba23@gmail.com
- License: Distributed under [Apache License Version 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)


## Citation

```bibtex
@article{dekany2025mixat,
  title={MixAT: Combining Continuous and Discrete Adversarial Training for LLMs},
  author={D{\'e}k{\'a}ny, Csaba and Balauca, Stefan and Staab, Robin and Dimitrov, Dimitar I and Vechev, Martin},
  journal={arXiv preprint arXiv:2505.16947},
  year={2025}
}
```