--- license: other license_name: health-ai-developer-foundations license_link: https://developers.google.com/health-ai-developer-foundations/terms pipeline_tag: text-generation extra_gated_heading: Access MedGemma on Hugging Face extra_gated_prompt: >- To access MedGemma on Hugging Face, you're required to review and agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms). To do this, please ensure you're logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license base_model: google/medgemma-27b-text-it tags: - medical - clinical-reasoning - thinking --- # litert-community/MedGemma-27B-IT This model provides a few variants of [google/medgemma-27b-text-it](https://huggingface.co/google/medgemma-27b-text-it) that are ready for deployment on Web using the [MediaPipe LLM Inference API](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference). ### Web * Build and run our [sample web app](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/js/README.md). To add the model to your web app, please follow the instructions in our [documentation](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js). ## Performance ### Web Note that all benchmark stats are from a MacBook Pro 2024 (Apple M4 Max chip) with 1280 KV cache size, 1024 tokens prefill, and 256 tokens decode, running in Chrome.
| Precision | Backend | Prefill (tokens/sec) | Decode (tokens/sec) | Time-to-first-token (sec) | GPU Memory | CPU Memory | Model size | ||
|---|---|---|---|---|---|---|---|---|---|
F16 |
int8 |
GPU |
167 tk/s |
8 tk/s |
14.9 s |
27.0 GB |
1.5 GB |
27.05 GB |
F32 |
int8 |
GPU |
97 tk/s |
8 tk/s |
15.0 s |
28.0 GB |
1.5 GB |
27.05 GB |