Model Card for HealthGPT-TinyLlama

This model is a fine-tuned version of TinyLlama-1.1B-Chat-v1.0 on a custom medical dataset. It was developed to serve as a lightweight, domain-specific assistant capable of answering medical questions fluently and coherently.

Model Details

Model Description

HealthGPT-TinyLlama is a 1.1B parameter model fine-tuned using LoRA adapters for the task of medical question answering. The base model is TinyLlama, a compact transformer architecture optimized for performance and efficiency.

  • Developed by: Selina Zarzour
  • Shared by: selinazarzour
  • Model type: Causal Language Model
  • Language(s): English
  • License: apache-2.0
  • Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Model Sources

Uses

Direct Use

  • Designed to answer general medical questions.
  • Intended for educational and experimental use.

Out-of-Scope Use

  • Not suitable for clinical decision-making or professional diagnosis.
  • Should not be relied on for life-critical use cases.

Bias, Risks, and Limitations

  • The model may hallucinate or provide medically inaccurate information.
  • It has not been validated against real-world clinical data.
  • Biases present in the training dataset may persist.

Recommendations

  • Always verify model outputs with qualified professionals.
  • Do not use in scenarios where safety is critical.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("selinazarzour/healthgpt-tinyllama")
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

prompt = "### Question:\nWhat are the symptoms of diabetes?\n\n### Answer:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

  • Finetuned on a synthetic dataset composed of medical questions and answers derived from reliable medical knowledge sources.

Training Procedure

  • LoRA adapter training using HuggingFace PEFT and transformers
  • Model merged with base weights after training

Training Hyperparameters

  • Precision: float16 mixed precision
  • Epochs: 3
  • Optimizer: AdamW
  • Batch size: 4

Evaluation

Testing Data, Factors & Metrics

  • Testing done manually by querying the model with unseen questions.
  • Sample outputs evaluated for relevance, grammar, and factual accuracy.

Results

  • The model produces relevant and coherent answers in most cases.
  • Model performs best on short, fact-based questions.

Model Examination

Screenshot of local Gradio app interface:

Note: The model was not deployed publicly due to GPU-only compatibility, but it runs successfully in local environments with GPU access.

image/png

Environmental Impact

  • Hardware Type: Google Colab GPU (T4/A100)
  • Hours used: ~3 hours
  • Cloud Provider: Google Cloud via Colab
  • Compute Region: US (unknown exact zone)
  • Carbon Emitted: Unknown

Technical Specifications

Model Architecture and Objective

  • LlamaForCausalLM with 22 layers, 32 attention heads, 2048 hidden size
  • LoRA finetuning applied to attention layers only

Compute Infrastructure

  • Hardware: Colab GPU

  • Software:

    • transformers 4.39+
    • peft
    • bitsandbytes (for initial quantized training)

Citation

APA: Zarzour, S. (2025). HealthGPT-TinyLlama: A fine-tuned 1.1B LLM for medical Q&A.

Model Card Contact

  • Contact: Selina Zarzour via Hugging Face (@selinazarzour)

Note: This model is a prototype and not intended for clinical use.

Downloads last month
2
Safetensors
Model size
631M params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for selinazarzour/healthgpt-tinyllama

Quantized
(97)
this model