--- license: other license_name: health-ai-developer-foundations license_link: https://developers.google.com/health-ai-developer-foundations/terms pipeline_tag: text-generation extra_gated_heading: Access MedGemma on Hugging Face extra_gated_prompt: >- To access MedGemma on Hugging Face, you're required to review and agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms). To do this, please ensure you're logged in to Hugging Face and click below. Requests are processed immediately. extra_gated_button_content: Acknowledge license base_model: google/medgemma-27b-text-it tags: - medical - clinical-reasoning - thinking --- # litert-community/MedGemma-27B-IT This model provides a few variants of [google/medgemma-27b-text-it](https://huggingface.co/google/medgemma-27b-text-it) that are ready for deployment on Web using the [MediaPipe LLM Inference API](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference). ### Web * Build and run our [sample web app](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/js/README.md). To add the model to your web app, please follow the instructions in our [documentation](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js). ## Performance ### Web Note that all benchmark stats are from a MacBook Pro 2024 (Apple M4 Max chip) with 1280 KV cache size, 1024 tokens prefill, and 256 tokens decode, running in Chrome.

	Precision	Backend	Prefill (tokens/sec)	Decode (tokens/sec)	Time-to-first-token (sec)	GPU Memory	CPU Memory	Model size
F16	int8	GPU	167 tk/s	8 tk/s	14.9 s	27.0 GB	1.5 GB	27.05 GB	🔗
F32	int8	GPU	97 tk/s	8 tk/s	15.0 s	28.0 GB	1.5 GB	27.05 GB	🔗

* Model size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models). * int8: quantized model with int8 weights and float activations. * GPU memory: measured by "GPU Process" memory for all of Chrome while running. Chrome was measured as using 340-350MB before any model loading took place. * CPU memory: measured for the entire tab while running. Tab was measured as using 60-70MB before any model loading took place.