Introduction

This model is a fine-tuned version of unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit. It has been trained using TRL.

This model is fine-tuned for detecting vulnerabilities in code with the Chain-of-Thought method.

Dataset Used: Mackerel2/cybernative_code_vulnerability_cot

Use Cases

Use for code vulnerability analysis
Use for general code related question answering (use without given chat template)

Use model with a chat template for Chain-of-Thought response

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel

# Define model IDs
base_model_id = "Qwen/Qwen2.5-1.5B-Instruct"
finetuned_model_id = "navodPeiris/Vulnerability-Analyst-Qwen2.5-1.5B-Instruct"

# Load tokenizer (trust remote code for Qwen models)
tokenizer = AutoTokenizer.from_pretrained(finetuned_model_id, trust_remote_code=True)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Apply LoRA weights
model = PeftModel.from_pretrained(model, finetuned_model_id)

# (Optional) Merge LoRA adapters for faster inference
model = model.merge_and_unload()

# Prompt construction
system_prompt = (
    "You are an expert coder with a strong code vulnerability detection and reasoning ability. "
    "You first think through the reasoning process step-by-step in your mind and then provide the user with the answer."
)

user_prompt = (
    "Below is a question that describes a coding related problem. Write a response that appropriately answers the question. "
    "Show your reasoning in <think> </think> tags. And return the final response in <answer> </answer> tags.\n"
    "###Question###:\n{question}\n"
    "###Response###:\n<think>"
)

# Example question
question = """Find vulnerabilities in the following PHP code:
```php
<?php
$db = new PDO('mysql:host=localhost;dbname=test', $user, $pass);
$username = $_GET['username'];
$password = $_GET['password'];
$sql = "SELECT * FROM users WHERE username = '$username' AND password = '$password'";
foreach ($db->query($sql) as $row) {
    print_r($row);
}
?>
```"""

# Apply tokenizer's chat template
prompt = tokenizer.apply_chat_template(
    [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt.format(question=question)},
    ],
    tokenize=False,
    add_generation_prompt=True,
)

# Run inference using transformers pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device_map="auto")
output = pipe(prompt, max_new_tokens=1024, return_full_text=False)[0]["generated_text"]

print("<think>\n" + output)

Training procedure

This model was trained with SFT Trainer of trl library. I have Leveraged Unsloth’s FastLanguageModel with 4-bit quantization and smart gradient checkpointing to fit within consumer GPUs. I designed prompts where reasoning is enclosed in <think>...</think> and final answers in <answer>...</answer>. This helps guide the model to reason step-by-step before answering. I have used SFTTrainer from HuggingFace TRL with LoRA + 8bit optimizer + cosine LR scheduling. Evaluation is performed every 50 steps. I have used PEFT/LoRA for efficient fine-tuning.

Parameter	Value
`per_device_train_batch_size`	8
`gradient_accumulation_steps`	4
`per_device_eval_batch_size`	16
`logging_steps`	50
`eval_steps`	50
`num_train_epochs`	2
`warmup_ratio`	0.03
`learning_rate`	3e-5
`fp16`	True
`optim`	adamw_8bit
`weight_decay`	0.1
`lr_scheduler_type`	cosine
`dataset_text_field`	prompt
`max_seq_length`	1024
`lora_rank (r)`	16
`target_modules`	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
`lora_alpha`	32
`use_gradient_checkpointing`	unsloth

Framework versions

TRL: 0.18.1
Transformers: 4.52.4
Pytorch: 2.6.0
Datasets: 3.6.0
Tokenizers: 0.21.1

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

navodPeiris
/

Vulnerability-Analyst-Qwen2.5-1.5B-Instruct