Transformers
Safetensors
ashkan-software2 commited on
Commit
c98a500
·
verified ·
1 Parent(s): df430ad

Add model card

Browse files
Files changed (1) hide show
  1. README.md +51 -3
README.md CHANGED
@@ -1,3 +1,51 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags: []
5
+ ---
6
+
7
+
8
+ ## Model Description
9
+ This Llama3-based model is fine-tuned using the "Representation Bending" (REPBEND) approach described in [Representation Bending for Large Language Model Safety](https://arxiv.org/abs/2504.01550). REPBEND modifies the model’s internal representations to reduce harmful or unsafe responses while preserving overall capabilities. The result is a model that is robust to various forms of adversarial jailbreak attacks, out-of-distribution harmful prompts, and fine-tuning exploits, all while maintaining useful and informative responses to benign requests.
10
+
11
+ ## Uses
12
+ ```python
13
+ import torch
14
+ from transformers import AutoTokenizer, AutoModelForCausalLM
15
+ from peft import PeftModel
16
+
17
+ model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
18
+ adapter_id = "AIM-Intelligence/RepBend_Llama3_8B_LoRA"
19
+ tokenizer = AutoTokenizer.from_pretrained(adapter_id, use_fast=False)
20
+ model = AutoModelForCausalLM.from_pretrained(
21
+ model_id,
22
+ torch_dtype=torch.bfloat16,
23
+ device_map="auto",
24
+ )
25
+ model = PeftModel.from_pretrained(model, adapter_id, adapter_name="default")
26
+
27
+ input_text = "Who are you?"
28
+ template = "<|start_header_id|>user<|end_header_id|>\n\n{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
29
+
30
+ prompt = template.format(instruction=input_text)
31
+
32
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
33
+ outputs = model.generate(input_ids, max_new_tokens=256)
34
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
35
+
36
+ print(generated_text)
37
+ ```
38
+
39
+ ## Code
40
+
41
+ Please refers to [this github page](https://github.com/AIM-Intelligence/RepBend/tree/main?tab=readme-ov-file)
42
+
43
+ ## Citation
44
+ ```
45
+ @article{repbend,
46
+ title={Representation Bending for Large Language Model Safety},
47
+ author={Yousefpour, Ashkan and Kim, Taeheon and Kwon, Ryan S and Lee, Seungbeen and Jeung, Wonje and Han, Seungju and Wan, Alvin and Ngan, Harrison and Yu, Youngjae and Choi, Jonghyun},
48
+ journal={arXiv preprint arXiv:2504.01550},
49
+ year={2025}
50
+ }
51
+ ```