Model Card for HealthGPT-TinyLlama
This model is a fine-tuned version of TinyLlama-1.1B-Chat-v1.0 on a custom medical dataset. It was developed to serve as a lightweight, domain-specific assistant capable of answering medical questions fluently and coherently.
Model Details
Model Description
HealthGPT-TinyLlama is a 1.1B parameter model fine-tuned using LoRA adapters for the task of medical question answering. The base model is TinyLlama, a compact transformer architecture optimized for performance and efficiency.
- Developed by: Selina Zarzour
- Shared by: selinazarzour
- Model type: Causal Language Model
- Language(s): English
- License: apache-2.0
- Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Model Sources
- Repository: https://huggingface.co/selinazarzour/healthgpt-tinyllama
- Demo (local only): Gradio app tested locally with GPU (not deployed to Spaces due to lack of CPU compatibility)
Uses
Direct Use
- Designed to answer general medical questions.
- Intended for educational and experimental use.
Out-of-Scope Use
- Not suitable for clinical decision-making or professional diagnosis.
- Should not be relied on for life-critical use cases.
Bias, Risks, and Limitations
- The model may hallucinate or provide medically inaccurate information.
- It has not been validated against real-world clinical data.
- Biases present in the training dataset may persist.
Recommendations
- Always verify model outputs with qualified professionals.
- Do not use in scenarios where safety is critical.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("selinazarzour/healthgpt-tinyllama")
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
prompt = "### Question:\nWhat are the symptoms of diabetes?\n\n### Answer:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0]))
Training Details
Training Data
- Finetuned on a synthetic dataset composed of medical questions and answers derived from reliable medical knowledge sources.
Training Procedure
- LoRA adapter training using HuggingFace PEFT and
transformers
- Model merged with base weights after training
Training Hyperparameters
- Precision: float16 mixed precision
- Epochs: 3
- Optimizer: AdamW
- Batch size: 4
Evaluation
Testing Data, Factors & Metrics
- Testing done manually by querying the model with unseen questions.
- Sample outputs evaluated for relevance, grammar, and factual accuracy.
Results
- The model produces relevant and coherent answers in most cases.
- Model performs best on short, fact-based questions.
Model Examination
Screenshot of local Gradio app interface:
Note: The model was not deployed publicly due to GPU-only compatibility, but it runs successfully in local environments with GPU access.
Environmental Impact
- Hardware Type: Google Colab GPU (T4/A100)
- Hours used: ~3 hours
- Cloud Provider: Google Cloud via Colab
- Compute Region: US (unknown exact zone)
- Carbon Emitted: Unknown
Technical Specifications
Model Architecture and Objective
- LlamaForCausalLM with 22 layers, 32 attention heads, 2048 hidden size
- LoRA finetuning applied to attention layers only
Compute Infrastructure
Hardware: Colab GPU
Software:
- transformers 4.39+
- peft
- bitsandbytes (for initial quantized training)
Citation
APA: Zarzour, S. (2025). HealthGPT-TinyLlama: A fine-tuned 1.1B LLM for medical Q&A.
Model Card Contact
- Contact: Selina Zarzour via Hugging Face (@selinazarzour)
Note: This model is a prototype and not intended for clinical use.
- Downloads last month
- 2
Model tree for selinazarzour/healthgpt-tinyllama
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0