gaunernst commited on
Commit
0bcfd35
·
verified ·
1 Parent(s): ad6960f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -96
README.md CHANGED
@@ -1,13 +1,11 @@
1
  ---
 
2
  license: gemma
3
- library_name: transformers
 
 
 
4
  pipeline_tag: image-text-to-text
5
- extra_gated_heading: Access Gemma on Hugging Face
6
- extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and
7
- agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging
8
- Face and click below. Requests are processed immediately.
9
- extra_gated_button_content: Acknowledge license
10
- base_model: google/gemma-3-4b-it
11
  ---
12
 
13
  # Gemma 3 4B Instruction-tuned QAT compressed-tensors
@@ -26,6 +24,16 @@ Below is the original model card.
26
 
27
  **Model Page**: [Gemma](https://ai.google.dev/gemma/docs/core)
28
 
 
 
 
 
 
 
 
 
 
 
29
  **Resources and Technical Documentation**:
30
 
31
  * [Gemma 3 Technical Report][g3-tech-report]
@@ -72,106 +80,29 @@ for everyone.
72
 
73
  ### Usage
74
 
75
- Below, there are some code snippets on how to get quickly started with running the model. First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.
 
 
76
 
77
  ```sh
78
- $ pip install -U transformers
79
  ```
80
 
81
- Then, copy the snippet from the section that is relevant for your use case.
82
-
83
- #### Running with the `pipeline` API
84
 
85
- You can initialize the model and processor for inference with `pipeline` as follows.
86
-
87
- ```python
88
- from transformers import pipeline
89
- import torch
90
-
91
- pipe = pipeline(
92
- "image-text-to-text",
93
- model="google/gemma-3-4b-it",
94
- device="cuda",
95
- torch_dtype=torch.bfloat16
96
- )
97
- ```
98
-
99
- With instruction-tuned models, you need to use chat templates to process our inputs first. Then, you can pass it to the pipeline.
100
-
101
- ```python
102
- messages = [
103
- {
104
- "role": "system",
105
- "content": [{"type": "text", "text": "You are a helpful assistant."}]
106
- },
107
- {
108
- "role": "user",
109
- "content": [
110
- {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
111
- {"type": "text", "text": "What animal is on the candy?"}
112
- ]
113
- }
114
- ]
115
-
116
- output = pipe(text=messages, max_new_tokens=200)
117
- print(output[0]["generated_text"][-1]["content"])
118
- # Okay, let's take a look!
119
- # Based on the image, the animal on the candy is a **turtle**.
120
- # You can see the shell shape and the head and legs.
121
  ```
122
 
123
- #### Running the model on a single/multi GPU
124
-
125
- ```python
126
- # pip install accelerate
127
 
128
- from transformers import AutoProcessor, Gemma3ForConditionalGeneration
129
- from PIL import Image
130
- import requests
131
- import torch
132
 
133
- model_id = "google/gemma-3-4b-it"
134
-
135
- model = Gemma3ForConditionalGeneration.from_pretrained(
136
- model_id, device_map="auto"
137
- ).eval()
138
-
139
- processor = AutoProcessor.from_pretrained(model_id)
140
-
141
- messages = [
142
- {
143
- "role": "system",
144
- "content": [{"type": "text", "text": "You are a helpful assistant."}]
145
- },
146
- {
147
- "role": "user",
148
- "content": [
149
- {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
150
- {"type": "text", "text": "Describe this image in detail."}
151
- ]
152
- }
153
- ]
154
-
155
- inputs = processor.apply_chat_template(
156
- messages, add_generation_prompt=True, tokenize=True,
157
- return_dict=True, return_tensors="pt"
158
- ).to(model.device, dtype=torch.bfloat16)
159
-
160
- input_len = inputs["input_ids"].shape[-1]
161
-
162
- with torch.inference_mode():
163
- generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
164
- generation = generation[0][input_len:]
165
-
166
- decoded = processor.decode(generation, skip_special_tokens=True)
167
- print(decoded)
168
-
169
- # **Overall Impression:** The image is a close-up shot of a vibrant garden scene,
170
- # focusing on a cluster of pink cosmos flowers and a busy bumblebee.
171
- # It has a slightly soft, natural feel, likely captured in daylight.
172
  ```
173
 
174
-
175
  ### Citation
176
 
177
  ```none
@@ -270,6 +201,10 @@ development workflow."*
270
 
271
  ## Evaluation
272
 
 
 
 
 
273
  Model evaluation metrics and results.
274
 
275
  ### Benchmark Results
 
1
  ---
2
+ base_model: google/gemma-3-4b-it
3
  license: gemma
4
+ tags:
5
+ - gemma3
6
+ - gemma
7
+ - google
8
  pipeline_tag: image-text-to-text
 
 
 
 
 
 
9
  ---
10
 
11
  # Gemma 3 4B Instruction-tuned QAT compressed-tensors
 
24
 
25
  **Model Page**: [Gemma](https://ai.google.dev/gemma/docs/core)
26
 
27
+ > [!Note]
28
+ > This repository corresponds to the 4B **instruction-tuned** version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT).
29
+ > The GGUF corresponds to Q4_0 quantization.
30
+ >
31
+ > Thanks to QAT, the model is able to preserve similar quality as `bfloat16` while significantly reducing the memory requirements
32
+ > to load the model.
33
+ >
34
+ > You can find the half-precision version [here](https://huggingface.co/google/gemma-3-4b-it).
35
+
36
+
37
  **Resources and Technical Documentation**:
38
 
39
  * [Gemma 3 Technical Report][g3-tech-report]
 
80
 
81
  ### Usage
82
 
83
+ Below, there are some code snippets on how to get quickly started with running the model.
84
+
85
+ **llama.cpp (text-only)**
86
 
87
  ```sh
88
+ ./llama-cli -hf google/gemma-3-4b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."
89
  ```
90
 
91
+ **llama.cpp (image input)**
 
 
92
 
93
+ ```sh
94
+ wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
95
+ ./llama-gemma3-cli -hf google/gemma-3-4b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  ```
97
 
98
+ **ollama (text only)**
 
 
 
99
 
100
+ Using GGUFs with Ollama via Hugging Face does not support image inputs at the moment. Please check the [docs on running gated repositories](https://huggingface.co/docs/hub/en/ollama#run-private-ggufs-from-the-hugging-face-hub).
 
 
 
101
 
102
+ ```sh
103
+ ollama run hf.co/google/gemma-3-4b-it-qat-q4_0-gguf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ```
105
 
 
106
  ### Citation
107
 
108
  ```none
 
201
 
202
  ## Evaluation
203
 
204
+ > [!Note]
205
+ > The evaluation in this section correspond to the original checkpoint, not the QAT checkpoint.
206
+ >
207
+
208
  Model evaluation metrics and results.
209
 
210
  ### Benchmark Results