Fine-tuning DeepSeek-R1-0528 in 4-bit Quantization for Medical Reasoning (MCQs)

This project fine-tunes the deepseek-ai/DeepSeek-R1-0528-Qwen3-8B model using a medical reasoning dataset (mamachang/medical-reasoning) with 4-bit quantization for memory-efficient training.

Setup

Install the required libraries:

pip install -U datasets accelerate peft trl bitsandbytes
pip install -U transformers==4.52.1
pip install huggingface_hub

Authenticate with Hugging Face Hub:

Make sure your Hugging Face token is stored in an environment variable:

export HF_TOKEN=your_huggingface_token

The notebook will automatically log you in using this token.

How to Run

Load the Model and Tokenizer
The script downloads the DeepSeek-R1-0528-Qwen3-8B model and applies 4-bit quantization with BitsAndBytesConfig for efficient memory usage.
Prepare the Dataset
- The notebook uses mamachang/medical-reasoning.
- It formats each example into an instruction-following prompt with step-by-step chain-of-thought reasoning.
Fine-tuning
- Fine-tuning is set up with PEFT (LoRA / Adapter Tuning style) to modify a small subset of model parameters.
- TRL (Transformer Reinforcement Learning) is used to fine-tune efficiently.
Push Fine-tuned Model
- After training, the fine-tuned model and tokenizer are pushed back to your Hugging Face Hub.

🧑‍💻Here is the training notebook: Fine_tuning_DeepSeek-R1-0528-Qwen3-8B

Model Configuration

Base Model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
Quantization: 4-bit (NF4)
Training: PEFT + TRL
Dataset: All examples from medical reasoning dataset

Notes

GPU Required: Make sure you have access to 1X RTX4090. Get it from RunPod for an hours. Training took only 15 minutes.
Environment: The notebook expects an environment where NVIDIA CUDA drivers are available (nvidia-smi check is included).
Memory Efficiency: 4-bit loading greatly reduces memory footprint.

Example Prompt Format

"""
Please answer with one of the options in the bracket. Write reasoning in between <analysis></analysis>. Write the answer in between <answer></answer>.

### Question:
{}

### Response:
{}"""

Usage Script (tested)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Base model
base_model_id = "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"

# Your fine-tuned LoRA adapter repository
lora_adapter_id = "kingabzpro/DeepSeek-R1-0528-Qwen3-8B-Medical-Reasoning"

# Load the model in 4-bit
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    trust_remote_code=True,
)

# Attach the LoRA adapter
model = PeftModel.from_pretrained(
    base_model,
    lora_adapter_id,
    device_map="auto",
    trust_remote_code=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

# Inference example
prompt = """
Please answer with one of the options in the bracket. Write reasoning in between <analysis></analysis>. Write the answer in between <answer></answer>.

### Question:
A research group wants to assess the relationship between childhood diet and cardiovascular disease in adulthood.
A prospective cohort study of 500 children between 10 to 15 years of age is conducted in which the participants' diets are recorded for 1 year and then the patients are assessed 20 years later for the presence of cardiovascular disease.
A statistically significant association is found between childhood consumption of vegetables and decreased risk of hyperlipidemia and exercise tolerance.
When these findings are submitted to a scientific journal, a peer reviewer comments that the researchers did not discuss the study's validity.
Which of the following additional analyses would most likely address the concerns about this study's design? 
{'A': 'Blinding', 'B': 'Crossover', 'C': 'Matching', 'D': 'Stratification', 'E': 'Randomization'},
### Response:
<analysis>

"""

inputs = tokenizer(
    [prompt + tokenizer.eos_token],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])

Output:

<analysis>
This is a question about evaluating the validity of a prospective cohort study design. The study looked at childhood diet and cardiovascular disease in adulthood. The peer reviewer was concerned about the study's validity. 

To address concerns about validity in a prospective cohort study, we need to consider potential confounding factors. The choices given are different statistical methods that can help control for confounding. 

Blinding and crossover designs are not applicable to a prospective cohort study. Matching and stratification can help control for confounding by balancing the distribution of confounders between groups. Randomization is the best way to minimize confounding by randomly assigning participants to different exposure groups.
</analysis>
<answer>
E: Randomization
</answer>

kingabzpro
/

DeepSeek-R1-0528-Qwen3-8B-Medical-Reasoning

Fine-tuning DeepSeek-R1-0528 in 4-bit Quantization for Medical Reasoning (MCQs)

Setup

How to Run

Model Configuration

Notes

Example Prompt Format

Usage Script (tested)

Model tree for kingabzpro/DeepSeek-R1-0528-Qwen3-8B-Medical-Reasoning

Dataset used to train kingabzpro/DeepSeek-R1-0528-Qwen3-8B-Medical-Reasoning

Collection including kingabzpro/DeepSeek-R1-0528-Qwen3-8B-Medical-Reasoning

Fine-tuning