J1-Llama-8B-exp / README.md
Juvoly's picture
Update README.md
6a0ce9e verified
---
license: llama3.1
language:
- en
metrics:
- accuracy
base_model:
- meta-llama/Llama-3.1-8B-Instruct
tags:
- medical
- clinical
model-index:
- name: J1-8B
results:
- task:
type: text-generation
dataset:
name: MedQA
type: MedQA
metrics:
- name: Accuracy
type: Accuracy
value: 79.34%
- task:
type: text-generation
dataset:
name: PubMedQA
type: PubMedQA
metrics:
- name: Accuracy
type: Accuracy
value: 81.00%
---
# Juvoly J1
## Model Details
- **Base Model**: Llama 3.1-8B-Instruct
- **Training**: 2688 B200 hours
- **Dataset**: 100B tokens of synthetic data based on CC-BY articles from PubMed.
- **Intended Use**: Experimental
- **Repository**: [GitHub](https://github.com/juvoly/j1)
## Description
Juvoly J1 is an experimental clinical reasoning model designed for testing in limited healthcare environments. The model has been fine-tuned on a carefully curated dataset of synthetic PubMed articles to enhance its medical reasoning capabilities while maintaining the general knowledge from its Llama 3.1-8B-Instruct base.
This model represents our initial effort toward creating accessible, efficient clinical reasoning tools that can operate within modest computational constraints. With a parameter count of only 8B, J1 demonstrates strong performance on medical benchmarks while requiring significantly fewer resources than larger alternatives.
## Performance
### Comparable size ~8B parameters
| Model | MedQA | PubMedQA | Average Tokens per Question |
|-------|-------|----------|----------------------------|
| **Juvoly J1 (this repo)** | **79.34%** | **81%** | MedQA: **2012**, PubMedQA: 1214 |
| Qwen3-8B ([link](https://huggingface.co/Qwen/Qwen3-8B)) | 75.81% | 79.50% | MedQA: 2608, PubMedQA: **894** |
| HuatuoGPT-o1-8B ([link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-8B)) | 72.6% | 79.2% | - |
| Delphyr M1 ([link](https://www.delphyr.ai/blog/delphyr-m1-best-in-class-medical-model)) | 64.7% | 76.8% | - |
| Llama 3.1-8B-Instruct ([link](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)) | 58.7% | 75.2% | - |
### Benchmarks on larger models
| Model | MedQA | PubMedQA | Average Tokens per Question |
|-------|-------|----------|----------------------------|
| GPT4o | 88% | - | - |
| HuatuoGPT-o1-70B ([link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-70B)) | 83.3% | 80.6% | - |
| Deepseek V3 | 80.9% | - | - |
*Note: The tables are sorted by MedQA performance in descending order.*
## Intended Use
This model is intended for **experimental use only**. It may be helpful for:
- Testing medical reasoning capabilities in controlled environments
- Research on smaller-scale medical language models
- Exploring the balance between model size and clinical reasoning performance
- Prototyping healthcare applications with reduced computational requirements
The model should not be used for actual medical decision-making or clinical applications without proper validation, oversight, and compliance with relevant regulations.
## Roadmap
We're actively developing our model ecosystem with several key initiatives underway:
- **Multilingual Support**: Expanding capabilities to support multiple languages for broader global access
- **Specialized Domain Variants**: Creating focused versions for specific medical specialties
- **Enhanced Reasoning**: Improving the model's ability to follow complex chains of medical logic
- **Reduced Token Usage**: Optimizing response generation for more efficient inference
## Evaluation
Steps to reproduce our evaluation results can be found in our [GitHub repository](https://github.com/juvoly/j1).
The experiments were run using:
```
python -m sglang.launch_server --port 8000 --model-path Juvoly/J1-Llama-8B-exp --tp-size 8 --mem-fraction-static 0.8
```
## Limitations
- As an experimental model, J1 may produce incorrect or incomplete medical information
- The model has not undergone clinical validation or regulatory approval
- Performance varies across different medical domains and question types
- The model inherits limitations from the base Llama 3.1-8B-Instruct architecture
## Citation
If you use this model in your research, please cite our repository:
```
@software{juvoly_j1_2025,
author = {Juvoly Team},
title = {J1: An Experimental Clinical Reasoning Model},
year = {2025},
url = {https://github.com/juvoly/j1}
}
```
## License
This model follows the license terms of the Llama 3.1-8B-Instruct base model, with additional terms for our fine-tuning process. Please refer to our repository for complete licensing information.