--- license: other license_name: health-ai-developer-foundations license_link: https://developers.google.com/health-ai-developer-foundations/terms pipeline_tag: text-generation extra_gated_heading: Access MedGemma on Hugging Face extra_gated_prompt: >- To access MedGemma on Hugging Face, you're required to review and agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms). To do this, please ensure you're logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license base_model: google/medgemma-27b-text-it tags: - medical - clinical-reasoning - thinking --- # litert-community/MedGemma-27B-IT This model provides a few variants of [google/medgemma-27b-text-it](https://huggingface.co/google/medgemma-27b-text-it) that are ready for deployment on Web using the [MediaPipe LLM Inference API](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference). ### Web * Build and run our [sample web app](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/js/README.md). To add the model to your web app, please follow the instructions in our [documentation](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js). ## Performance ### Web Note that all benchmark stats are from a MacBook Pro 2024 (Apple M4 Max chip) with 1280 KV cache size, 1024 tokens prefill, and 256 tokens decode, running in Chrome.
Precision Backend Prefill (tokens/sec) Decode (tokens/sec) Time-to-first-token (sec) GPU Memory CPU Memory Model size

F16

int8

GPU

167 tk/s

8 tk/s

14.9 s

27.0 GB

1.5 GB

27.05 GB

🔗

F32

int8

GPU

97 tk/s

8 tk/s

15.0 s

28.0 GB

1.5 GB

27.05 GB

🔗

* Model size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models). * int8: quantized model with int8 weights and float activations. * GPU memory: measured by "GPU Process" memory for all of Chrome while running. Chrome was measured as using 340-350MB before any model loading took place. * CPU memory: measured for the entire tab while running. Tab was measured as using 60-70MB before any model loading took place.