Update README.md
Browse files
README.md
CHANGED
@@ -1,13 +1,11 @@
|
|
1 |
---
|
|
|
2 |
license: gemma
|
3 |
-
|
|
|
|
|
|
|
4 |
pipeline_tag: image-text-to-text
|
5 |
-
extra_gated_heading: Access Gemma on Hugging Face
|
6 |
-
extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and
|
7 |
-
agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging
|
8 |
-
Face and click below. Requests are processed immediately.
|
9 |
-
extra_gated_button_content: Acknowledge license
|
10 |
-
base_model: google/gemma-3-4b-it
|
11 |
---
|
12 |
|
13 |
# Gemma 3 4B Instruction-tuned QAT compressed-tensors
|
@@ -26,6 +24,16 @@ Below is the original model card.
|
|
26 |
|
27 |
**Model Page**: [Gemma](https://ai.google.dev/gemma/docs/core)
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
**Resources and Technical Documentation**:
|
30 |
|
31 |
* [Gemma 3 Technical Report][g3-tech-report]
|
@@ -72,106 +80,29 @@ for everyone.
|
|
72 |
|
73 |
### Usage
|
74 |
|
75 |
-
Below, there are some code snippets on how to get quickly started with running the model.
|
|
|
|
|
76 |
|
77 |
```sh
|
78 |
-
|
79 |
```
|
80 |
|
81 |
-
|
82 |
-
|
83 |
-
#### Running with the `pipeline` API
|
84 |
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
from transformers import pipeline
|
89 |
-
import torch
|
90 |
-
|
91 |
-
pipe = pipeline(
|
92 |
-
"image-text-to-text",
|
93 |
-
model="google/gemma-3-4b-it",
|
94 |
-
device="cuda",
|
95 |
-
torch_dtype=torch.bfloat16
|
96 |
-
)
|
97 |
-
```
|
98 |
-
|
99 |
-
With instruction-tuned models, you need to use chat templates to process our inputs first. Then, you can pass it to the pipeline.
|
100 |
-
|
101 |
-
```python
|
102 |
-
messages = [
|
103 |
-
{
|
104 |
-
"role": "system",
|
105 |
-
"content": [{"type": "text", "text": "You are a helpful assistant."}]
|
106 |
-
},
|
107 |
-
{
|
108 |
-
"role": "user",
|
109 |
-
"content": [
|
110 |
-
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
|
111 |
-
{"type": "text", "text": "What animal is on the candy?"}
|
112 |
-
]
|
113 |
-
}
|
114 |
-
]
|
115 |
-
|
116 |
-
output = pipe(text=messages, max_new_tokens=200)
|
117 |
-
print(output[0]["generated_text"][-1]["content"])
|
118 |
-
# Okay, let's take a look!
|
119 |
-
# Based on the image, the animal on the candy is a **turtle**.
|
120 |
-
# You can see the shell shape and the head and legs.
|
121 |
```
|
122 |
|
123 |
-
|
124 |
-
|
125 |
-
```python
|
126 |
-
# pip install accelerate
|
127 |
|
128 |
-
|
129 |
-
from PIL import Image
|
130 |
-
import requests
|
131 |
-
import torch
|
132 |
|
133 |
-
|
134 |
-
|
135 |
-
model = Gemma3ForConditionalGeneration.from_pretrained(
|
136 |
-
model_id, device_map="auto"
|
137 |
-
).eval()
|
138 |
-
|
139 |
-
processor = AutoProcessor.from_pretrained(model_id)
|
140 |
-
|
141 |
-
messages = [
|
142 |
-
{
|
143 |
-
"role": "system",
|
144 |
-
"content": [{"type": "text", "text": "You are a helpful assistant."}]
|
145 |
-
},
|
146 |
-
{
|
147 |
-
"role": "user",
|
148 |
-
"content": [
|
149 |
-
{"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
|
150 |
-
{"type": "text", "text": "Describe this image in detail."}
|
151 |
-
]
|
152 |
-
}
|
153 |
-
]
|
154 |
-
|
155 |
-
inputs = processor.apply_chat_template(
|
156 |
-
messages, add_generation_prompt=True, tokenize=True,
|
157 |
-
return_dict=True, return_tensors="pt"
|
158 |
-
).to(model.device, dtype=torch.bfloat16)
|
159 |
-
|
160 |
-
input_len = inputs["input_ids"].shape[-1]
|
161 |
-
|
162 |
-
with torch.inference_mode():
|
163 |
-
generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
|
164 |
-
generation = generation[0][input_len:]
|
165 |
-
|
166 |
-
decoded = processor.decode(generation, skip_special_tokens=True)
|
167 |
-
print(decoded)
|
168 |
-
|
169 |
-
# **Overall Impression:** The image is a close-up shot of a vibrant garden scene,
|
170 |
-
# focusing on a cluster of pink cosmos flowers and a busy bumblebee.
|
171 |
-
# It has a slightly soft, natural feel, likely captured in daylight.
|
172 |
```
|
173 |
|
174 |
-
|
175 |
### Citation
|
176 |
|
177 |
```none
|
@@ -270,6 +201,10 @@ development workflow."*
|
|
270 |
|
271 |
## Evaluation
|
272 |
|
|
|
|
|
|
|
|
|
273 |
Model evaluation metrics and results.
|
274 |
|
275 |
### Benchmark Results
|
|
|
1 |
---
|
2 |
+
base_model: google/gemma-3-4b-it
|
3 |
license: gemma
|
4 |
+
tags:
|
5 |
+
- gemma3
|
6 |
+
- gemma
|
7 |
+
- google
|
8 |
pipeline_tag: image-text-to-text
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
# Gemma 3 4B Instruction-tuned QAT compressed-tensors
|
|
|
24 |
|
25 |
**Model Page**: [Gemma](https://ai.google.dev/gemma/docs/core)
|
26 |
|
27 |
+
> [!Note]
|
28 |
+
> This repository corresponds to the 4B **instruction-tuned** version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT).
|
29 |
+
> The GGUF corresponds to Q4_0 quantization.
|
30 |
+
>
|
31 |
+
> Thanks to QAT, the model is able to preserve similar quality as `bfloat16` while significantly reducing the memory requirements
|
32 |
+
> to load the model.
|
33 |
+
>
|
34 |
+
> You can find the half-precision version [here](https://huggingface.co/google/gemma-3-4b-it).
|
35 |
+
|
36 |
+
|
37 |
**Resources and Technical Documentation**:
|
38 |
|
39 |
* [Gemma 3 Technical Report][g3-tech-report]
|
|
|
80 |
|
81 |
### Usage
|
82 |
|
83 |
+
Below, there are some code snippets on how to get quickly started with running the model.
|
84 |
+
|
85 |
+
**llama.cpp (text-only)**
|
86 |
|
87 |
```sh
|
88 |
+
./llama-cli -hf google/gemma-3-4b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."
|
89 |
```
|
90 |
|
91 |
+
**llama.cpp (image input)**
|
|
|
|
|
92 |
|
93 |
+
```sh
|
94 |
+
wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
|
95 |
+
./llama-gemma3-cli -hf google/gemma-3-4b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
```
|
97 |
|
98 |
+
**ollama (text only)**
|
|
|
|
|
|
|
99 |
|
100 |
+
Using GGUFs with Ollama via Hugging Face does not support image inputs at the moment. Please check the [docs on running gated repositories](https://huggingface.co/docs/hub/en/ollama#run-private-ggufs-from-the-hugging-face-hub).
|
|
|
|
|
|
|
101 |
|
102 |
+
```sh
|
103 |
+
ollama run hf.co/google/gemma-3-4b-it-qat-q4_0-gguf
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
```
|
105 |
|
|
|
106 |
### Citation
|
107 |
|
108 |
```none
|
|
|
201 |
|
202 |
## Evaluation
|
203 |
|
204 |
+
> [!Note]
|
205 |
+
> The evaluation in this section correspond to the original checkpoint, not the QAT checkpoint.
|
206 |
+
>
|
207 |
+
|
208 |
Model evaluation metrics and results.
|
209 |
|
210 |
### Benchmark Results
|