---
base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit
library_name: transformers
model_name: Vulnerability-Analyst-Qwen2.5-1.5B-Instruct
tags:
- question-answering
- chat
- text-generation
- unsloth
- trl
- sft
licence: license
license: mit
datasets:
- Mackerel2/cybernative_code_vulnerability_cot
---

# Introduction

This model is a fine-tuned version of [unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit).
It has been trained using [TRL](https://github.com/huggingface/trl). 

This model is fine-tuned for detecting vulnerabilities in code with the Chain-of-Thought method.

Dataset Used: [Mackerel2/cybernative_code_vulnerability_cot](https://huggingface.co/datasets/Mackerel2/cybernative_code_vulnerability_cot)

## Use Cases

- Use for code vulnerability analysis
- Use for general code related question answering (use without given chat template)

## Use model with a chat template for Chain-of-Thought response
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel

# Define model IDs
base_model_id = "Qwen/Qwen2.5-1.5B-Instruct"
finetuned_model_id = "navodPeiris/Vulnerability-Analyst-Qwen2.5-1.5B-Instruct"

# Load tokenizer (trust remote code for Qwen models)
tokenizer = AutoTokenizer.from_pretrained(finetuned_model_id, trust_remote_code=True)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Apply LoRA weights
model = PeftModel.from_pretrained(model, finetuned_model_id)

# (Optional) Merge LoRA adapters for faster inference
model = model.merge_and_unload()

# Prompt construction
system_prompt = (
    "You are an expert coder with a strong code vulnerability detection and reasoning ability. "
    "You first think through the reasoning process step-by-step in your mind and then provide the user with the answer."
)

user_prompt = (
    "Below is a question that describes a coding related problem. Write a response that appropriately answers the question. "
    "Show your reasoning in <think> </think> tags. And return the final response in <answer> </answer> tags.\n"
    "###Question###:\n{question}\n"
    "###Response###:\n<think>"
)

# Example question
question = """Find vulnerabilities in the following PHP code:
```php
<?php
$db = new PDO('mysql:host=localhost;dbname=test', $user, $pass);
$username = $_GET['username'];
$password = $_GET['password'];
$sql = "SELECT * FROM users WHERE username = '$username' AND password = '$password'";
foreach ($db->query($sql) as $row) {
    print_r($row);
}
?>
```"""

# Apply tokenizer's chat template
prompt = tokenizer.apply_chat_template(
    [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt.format(question=question)},
    ],
    tokenize=False,
    add_generation_prompt=True,
)

# Run inference using transformers pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device_map="auto")
output = pipe(prompt, max_new_tokens=1024, return_full_text=False)[0]["generated_text"]

print("<think>\n" + output)
```

## Training procedure

This model was trained with SFT Trainer of trl library. I have Leveraged Unsloth’s FastLanguageModel with 4-bit quantization and smart gradient checkpointing to fit within consumer GPUs. I designed prompts where reasoning is enclosed in \<think>...\</think> and final answers in \<answer>...\</answer>. This helps guide the model to reason step-by-step before answering. I have used SFTTrainer from HuggingFace TRL with LoRA + 8bit optimizer + cosine LR scheduling. Evaluation is performed every 50 steps. I have used PEFT/LoRA for efficient fine-tuning.

| Parameter                   | Value                               |
|----------------------------|-------------------------------------|
| `per_device_train_batch_size` | 8                                   |
| `gradient_accumulation_steps` | 4                                   |
| `per_device_eval_batch_size`  | 16                                  |
| `logging_steps`               | 50                                  |
| `eval_steps`                  | 50                                  |
| `num_train_epochs`            | 2                                   |
| `warmup_ratio`                | 0.03                                |
| `learning_rate`               | 3e-5                                |
| `fp16`                        | True                                |
| `optim`                       | adamw_8bit                          |
| `weight_decay`               | 0.1                                 |
| `lr_scheduler_type`          | cosine                              |
| `dataset_text_field`         | prompt                              |
| `max_seq_length`             | 1024                                |
| `lora_rank (r)`              | 16                                  |
| `target_modules`             | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| `lora_alpha`                 | 32                                  |
| `use_gradient_checkpointing` | unsloth                             |


### Framework versions

- TRL: 0.18.1
- Transformers: 4.52.4
- Pytorch: 2.6.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1

## Citations


Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```