J1-Llama-8B-exp / README.md
Juvoly's picture
Update README.md
6a0ce9e verified
metadata
license: llama3.1
language:
  - en
metrics:
  - accuracy
base_model:
  - meta-llama/Llama-3.1-8B-Instruct
tags:
  - medical
  - clinical
model-index:
  - name: J1-8B
    results:
      - task:
          type: text-generation
        dataset:
          name: MedQA
          type: MedQA
        metrics:
          - name: Accuracy
            type: Accuracy
            value: 79.34%
      - task:
          type: text-generation
        dataset:
          name: PubMedQA
          type: PubMedQA
        metrics:
          - name: Accuracy
            type: Accuracy
            value: 81.00%

Juvoly J1

Model Details

  • Base Model: Llama 3.1-8B-Instruct
  • Training: 2688 B200 hours
  • Dataset: 100B tokens of synthetic data based on CC-BY articles from PubMed.
  • Intended Use: Experimental
  • Repository: GitHub

Description

Juvoly J1 is an experimental clinical reasoning model designed for testing in limited healthcare environments. The model has been fine-tuned on a carefully curated dataset of synthetic PubMed articles to enhance its medical reasoning capabilities while maintaining the general knowledge from its Llama 3.1-8B-Instruct base.

This model represents our initial effort toward creating accessible, efficient clinical reasoning tools that can operate within modest computational constraints. With a parameter count of only 8B, J1 demonstrates strong performance on medical benchmarks while requiring significantly fewer resources than larger alternatives.

Performance

Comparable size ~8B parameters

Model MedQA PubMedQA Average Tokens per Question
Juvoly J1 (this repo) 79.34% 81% MedQA: 2012, PubMedQA: 1214
Qwen3-8B (link) 75.81% 79.50% MedQA: 2608, PubMedQA: 894
HuatuoGPT-o1-8B (link) 72.6% 79.2% -
Delphyr M1 (link) 64.7% 76.8% -
Llama 3.1-8B-Instruct (link) 58.7% 75.2% -

Benchmarks on larger models

Model MedQA PubMedQA Average Tokens per Question
GPT4o 88% - -
HuatuoGPT-o1-70B (link) 83.3% 80.6% -
Deepseek V3 80.9% - -

Note: The tables are sorted by MedQA performance in descending order.

Intended Use

This model is intended for experimental use only. It may be helpful for:

  • Testing medical reasoning capabilities in controlled environments
  • Research on smaller-scale medical language models
  • Exploring the balance between model size and clinical reasoning performance
  • Prototyping healthcare applications with reduced computational requirements

The model should not be used for actual medical decision-making or clinical applications without proper validation, oversight, and compliance with relevant regulations.

Roadmap

We're actively developing our model ecosystem with several key initiatives underway:

  • Multilingual Support: Expanding capabilities to support multiple languages for broader global access
  • Specialized Domain Variants: Creating focused versions for specific medical specialties
  • Enhanced Reasoning: Improving the model's ability to follow complex chains of medical logic
  • Reduced Token Usage: Optimizing response generation for more efficient inference

Evaluation

Steps to reproduce our evaluation results can be found in our GitHub repository.

The experiments were run using:

python -m sglang.launch_server --port 8000 --model-path Juvoly/J1-Llama-8B-exp --tp-size 8 --mem-fraction-static 0.8

Limitations

  • As an experimental model, J1 may produce incorrect or incomplete medical information
  • The model has not undergone clinical validation or regulatory approval
  • Performance varies across different medical domains and question types
  • The model inherits limitations from the base Llama 3.1-8B-Instruct architecture

Citation

If you use this model in your research, please cite our repository:

@software{juvoly_j1_2025,
  author = {Juvoly Team},
  title = {J1: An Experimental Clinical Reasoning Model},
  year = {2025},
  url = {https://github.com/juvoly/j1}
}

License

This model follows the license terms of the Llama 3.1-8B-Instruct base model, with additional terms for our fine-tuning process. Please refer to our repository for complete licensing information.