Model Card for HealthGPT-TinyLlama

This model is a fine-tuned version of TinyLlama-1.1B-Chat-v1.0 on a custom medical dataset. It was developed to serve as a lightweight, domain-specific assistant capable of answering medical questions fluently and coherently.

Model Details

Model Description

HealthGPT-TinyLlama is a 1.1B parameter model fine-tuned using LoRA adapters for the task of medical question answering. The base model is TinyLlama, a compact transformer architecture optimized for performance and efficiency.

Developed by: Selina Zarzour
Shared by: selinazarzour
Model type: Causal Language Model
Language(s): English
License: apache-2.0
Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Model Sources

Repository: https://huggingface.co/selinazarzour/healthgpt-tinyllama
Demo (local only): Gradio app tested locally with GPU (not deployed to Spaces due to lack of CPU compatibility)

Uses

Direct Use

Designed to answer general medical questions.
Intended for educational and experimental use.

Out-of-Scope Use

Not suitable for clinical decision-making or professional diagnosis.
Should not be relied on for life-critical use cases.

Bias, Risks, and Limitations

The model may hallucinate or provide medically inaccurate information.
It has not been validated against real-world clinical data.
Biases present in the training dataset may persist.

Recommendations

Always verify model outputs with qualified professionals.
Do not use in scenarios where safety is critical.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("selinazarzour/healthgpt-tinyllama")
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

prompt = "### Question:\nWhat are the symptoms of diabetes?\n\n### Answer:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Finetuned on a synthetic dataset composed of medical questions and answers derived from reliable medical knowledge sources.

Training Procedure

LoRA adapter training using HuggingFace PEFT and transformers
Model merged with base weights after training

Training Hyperparameters

Precision: float16 mixed precision
Epochs: 3
Optimizer: AdamW
Batch size: 4

Evaluation

Testing Data, Factors & Metrics

Testing done manually by querying the model with unseen questions.
Sample outputs evaluated for relevance, grammar, and factual accuracy.

Results

The model produces relevant and coherent answers in most cases.
Model performs best on short, fact-based questions.

Model Examination

Screenshot of local Gradio app interface:

Note: The model was not deployed publicly due to GPU-only compatibility, but it runs successfully in local environments with GPU access.

Environmental Impact

Hardware Type: Google Colab GPU (T4/A100)
Hours used: ~3 hours
Cloud Provider: Google Cloud via Colab
Compute Region: US (unknown exact zone)
Carbon Emitted: Unknown

Technical Specifications

Model Architecture and Objective

LlamaForCausalLM with 22 layers, 32 attention heads, 2048 hidden size
LoRA finetuning applied to attention layers only

Compute Infrastructure

Hardware: Colab GPU
Software:
- transformers 4.39+
- peft
- bitsandbytes (for initial quantized training)

Citation

APA: Zarzour, S. (2025). HealthGPT-TinyLlama: A fine-tuned 1.1B LLM for medical Q&A.

Model Card Contact

Contact: Selina Zarzour via Hugging Face (@selinazarzour)

Note: This model is a prototype and not intended for clinical use.

selinazarzour
/

healthgpt-tinyllama