You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Juvoly J1

Model Details

  • Base Model: Llama 3.1-8B-Instruct
  • Training: 2688 B200 hours
  • Dataset: 100B tokens of synthetic data based on CC-BY articles from PubMed.
  • Intended Use: Experimental
  • Repository: GitHub

Description

Juvoly J1 is an experimental clinical reasoning model designed for testing in limited healthcare environments. The model has been fine-tuned on a carefully curated dataset of synthetic PubMed articles to enhance its medical reasoning capabilities while maintaining the general knowledge from its Llama 3.1-8B-Instruct base.

This model represents our initial effort toward creating accessible, efficient clinical reasoning tools that can operate within modest computational constraints. With a parameter count of only 8B, J1 demonstrates strong performance on medical benchmarks while requiring significantly fewer resources than larger alternatives.

Performance

Comparable size ~8B parameters

Model MedQA PubMedQA Average Tokens per Question
Juvoly J1 (this repo) 79.34% 81% MedQA: 2012, PubMedQA: 1214
Qwen3-8B (link) 75.81% 79.50% MedQA: 2608, PubMedQA: 894
HuatuoGPT-o1-8B (link) 72.6% 79.2% -
Delphyr M1 (link) 64.7% 76.8% -
Llama 3.1-8B-Instruct (link) 58.7% 75.2% -

Benchmarks on larger models

Model MedQA PubMedQA Average Tokens per Question
GPT4o 88% - -
HuatuoGPT-o1-70B (link) 83.3% 80.6% -
Deepseek V3 80.9% - -

Note: The tables are sorted by MedQA performance in descending order.

Intended Use

This model is intended for experimental use only. It may be helpful for:

  • Testing medical reasoning capabilities in controlled environments
  • Research on smaller-scale medical language models
  • Exploring the balance between model size and clinical reasoning performance
  • Prototyping healthcare applications with reduced computational requirements

The model should not be used for actual medical decision-making or clinical applications without proper validation, oversight, and compliance with relevant regulations.

Roadmap

We're actively developing our model ecosystem with several key initiatives underway:

  • Multilingual Support: Expanding capabilities to support multiple languages for broader global access
  • Specialized Domain Variants: Creating focused versions for specific medical specialties
  • Enhanced Reasoning: Improving the model's ability to follow complex chains of medical logic
  • Reduced Token Usage: Optimizing response generation for more efficient inference

Evaluation

Steps to reproduce our evaluation results can be found in our GitHub repository.

The experiments were run using:

python -m sglang.launch_server --port 8000 --model-path Juvoly/J1-Llama-8B-exp --tp-size 8 --mem-fraction-static 0.8

Limitations

  • As an experimental model, J1 may produce incorrect or incomplete medical information
  • The model has not undergone clinical validation or regulatory approval
  • Performance varies across different medical domains and question types
  • The model inherits limitations from the base Llama 3.1-8B-Instruct architecture

Citation

If you use this model in your research, please cite our repository:

@software{juvoly_j1_2025,
  author = {Juvoly Team},
  title = {J1: An Experimental Clinical Reasoning Model},
  year = {2025},
  url = {https://github.com/juvoly/j1}
}

License

This model follows the license terms of the Llama 3.1-8B-Instruct base model, with additional terms for our fine-tuning process. Please refer to our repository for complete licensing information.

Downloads last month
96
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Juvoly/J1-Llama-8B-exp

Finetuned
(1406)
this model

Evaluation results