schmidt-sebastian commited on
Commit
0a7b470
·
verified ·
1 Parent(s): 3bf97ca

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -35,3 +35,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task filter=lfs diff=lfs merge=lfs -text
37
  Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv1280.task filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task filter=lfs diff=lfs merge=lfs -text
37
  Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv1280.task filter=lfs diff=lfs merge=lfs -text
38
+ Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.task filter=lfs diff=lfs merge=lfs -text
39
+ Qwen2.5-1.5B-Instruct_seq128_q8_ekv4096.task filter=lfs diff=lfs merge=lfs -text
40
+ Qwen2.5-1.5B-Instruct_seq128_q8_ekv1280.task filter=lfs diff=lfs merge=lfs -text
41
+ Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv4096.task filter=lfs diff=lfs merge=lfs -text
Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv1280.task CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0a4a95c9bfd3132201f306e29d3d3b134469349a74c692ae2f5206e191480e1c
3
- size 6190659644
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edec401a8dd8de5409a5a1618e73ade59385d7a275b383b5c664e99882949c6d
3
+ size 6181747452
Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv4096.task ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a6bf44f952218b22a76cc1b94ac46a5db8dc43c7f31b7e635f2ca608e3f35ed
3
+ size 4296491008
Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:63f1f319ae8c0db7616217537c9b0b603f11aebe3ef61917184e1c74e1be217a
3
- size 1625493432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d867a7c93a6acf2892f08e0174e2f6f351ad256b7e3cfb6d6cd9c89794b42e0
3
+ size 1597913616
Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.task ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a662d1af6ca2c750bc047738d6f48441d4b6968062c24ef52fcff5d693291f6
3
+ size 1597913616
Qwen2.5-1.5B-Instruct_seq128_q8_ekv1280.task ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8771564e61c908c2199bcaa28b0ff9c5f55afb2ae73fbe263142a067113968df
3
+ size 1567364648
Qwen2.5-1.5B-Instruct_seq128_q8_ekv4096.task ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98c289e1c43cc592ac535594d5de4bdde449e8dc012ac66909064b6880f8b717
3
+ size 1567364648
README.md CHANGED
@@ -29,55 +29,102 @@ on Colab could be much worse than on a local device.*
29
  ### Android
30
 
31
  * Download and install
32
- [the apk](https://github.com/google-ai-edge/mediapipe-samples/releases/latest/download/llm_inference-debug.apk).
33
  * Follow the instructions in the app.
34
 
35
- To build the demo app from source, please follow the
36
- [instructions](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/android/README.md)
37
  from the GitHub repository.
38
 
 
 
 
 
 
 
 
39
  ## Performance
40
 
41
  ### Android
42
 
43
- Note that all benchmark stats are from a Samsung S24 Ultra with
44
- 1280 KV cache size with multiple prefill signatures enabled.
45
 
46
  <table border="1">
47
  <tr>
 
 
 
 
 
 
 
 
 
48
  <th></th>
49
- <th>Backend</th>
50
- <th>Prefill (tokens/sec)</th>
51
- <th>Decode (tokens/sec)</th>
52
- <th>Time-to-first-token (sec)</th>
53
- <th>Memory (RSS in MB)</th>
54
- <th>Model size (MB)</th>
55
  </tr>
56
  <tr>
57
- <td>fp32 (baseline)</td>
58
- <td>cpu</td>
59
- <td><p style="text-align: right">37.21 tk/s</p></td>
60
- <td><p style="text-align: right">5.22 tk/s</p></td>
61
- <td><p style="text-align: right">16.85 s</p></td>
62
- <td><p style="text-align: right">6,662 MB</p></td>
63
- <td><p style="text-align: right">5,903 MB</p></td>
 
 
 
64
  </tr>
65
  <tr>
66
- <td>dynamic_int8</td>
67
- <td>cpu</td>
68
- <td><p style="text-align: right">113.68 tk/s</p></td>
69
- <td><p style="text-align: right">16.41 tk/s</p></td>
70
- <td><p style="text-align: right">5.79 s</p></td>
71
- <td><p style="text-align: right">3,593 MB</p></td>
72
- <td><p style="text-align: right">1,550 MB</p></td>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  </tr>
74
 
75
  </table>
76
 
 
 
77
  * Model Size: measured by the size of the .tflite flatbuffer (serialization
78
  format for LiteRT models)
79
  * Memory: indicator of peak RAM usage
80
  * The inference on CPU is accelerated via the LiteRT
81
  [XNNPACK](https://github.com/google/XNNPACK) delegate with 4 threads
82
- * Benchmark is done assuming XNNPACK cache is enabled
83
- * dynamic_int8: quantized model with int8 weights and float activations.
 
 
29
  ### Android
30
 
31
  * Download and install
32
+ [the apk](https://github.com/google-ai-edge/gallery/releases/latest/download/ai-edge-gallery.apk).
33
  * Follow the instructions in the app.
34
 
35
+ To build the demo app from source, please follow the [instructions](https://github.com/google-ai-edge/gallery/blob/main/README.md)
 
36
  from the GitHub repository.
37
 
38
+ ### iOS
39
+
40
+ * Clone the [MediaPipe samples](https://github.com/google-ai-edge/mediapipe-samples)
41
+ repository and follow the [instructions](https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/llm_inference/ios/README.md)
42
+ to build the LLM Inference iOS Sample App using XCode.
43
+ * Run the app via the iOS simulator or deploy to an iOS device.
44
+
45
  ## Performance
46
 
47
  ### Android
48
 
49
+ Note that all benchmark stats are from a Samsung S24 Ultra and multiple prefill signatures enabled.
 
50
 
51
  <table border="1">
52
  <tr>
53
+ <th style="text-align: left">Backend</th>
54
+ <th style="text-align: left">Quantization scheme</th>
55
+ <th style="text-align: left">Context length</th>
56
+ <th style="text-align: left">Prefill (tokens/sec)</th>
57
+ <th style="text-align: left">Decode (tokens/sec)</th>
58
+ <th style="text-align: left">Time-to-first-token (sec)</th>
59
+ <th style="text-align: left">CPU Memory (RSS in MB)</th>
60
+ <th style="text-align: left">GPU Memory (RSS in MB)</th>
61
+ <th style="text-align: left">Model size (MB)</th>
62
  <th></th>
 
 
 
 
 
 
63
  </tr>
64
  <tr>
65
+ <td rowspan="3"><p style="text-align: left">CPU</p></td>
66
+ <td><p style="text-align: left">fp32 (baseline)</p></td>
67
+ <td><p style="text-align: right">1280</p></td>
68
+ <td><p style="text-align: right">27 tk/s</p></td>
69
+ <td><p style="text-align: right">6 tk/s</p></td>
70
+ <td><p style="text-align: right">9.88 s</p></td>
71
+ <td><p style="text-align: right">6,144 MB</p></td>
72
+ <td><p style="text-align: right"></p></td>
73
+ <td><p style="text-align: right">5,895 MB</p></td>
74
+ <td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv1280.task">&#128279;</a></p></td>
75
  </tr>
76
  <tr>
77
+ <td rowspan="4"><p style="text-align: left">dynamic_int8</p></td>
78
+ <td><p style="text-align: right">1280</p></td>
79
+ <td><p style="text-align: right">106 tk/s</p></td>
80
+ <td><p style="text-align: right">23 tk/s</p></td>
81
+ <td><p style="text-align: right">2.74 s</p></td>
82
+ <td><p style="text-align: right">1,820 MB</p></td>
83
+ <td><p style="text-align: right"></p></td>
84
+ <td><p style="text-align: right">1,523 MB</p></td>
85
+ <td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task">&#128279;</a></p></td>
86
+ </tr>
87
+ <tr>
88
+ <td><p style="text-align: right">4096</p></td>
89
+ <td><p style="text-align: right">63 tk/s</p></td>
90
+ <td><p style="text-align: right">20 tk/s</p></td>
91
+ <td><p style="text-align: right">4.40 s</p></td>
92
+ <td><p style="text-align: right">2,042 MB</p></td>
93
+ <td><p style="text-align: right"></p></td>
94
+ <td><p style="text-align: right">1,523 MB</p></td>
95
+ <td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.task">&#128279;</a></p></td>
96
+ </tr>
97
+ <tr>
98
+ <td rowspan="2"><p style="text-align: left">GPU</p></td>
99
+ <td><p style="text-align: right">1280</p></td>
100
+ <td><p style="text-align: right">706 tk/s</p></td>
101
+ <td><p style="text-align: right">24 tk/s</p></td>
102
+ <td><p style="text-align: right">6.94 s</p></td>
103
+ <td><p style="text-align: right">3,175 MB</p></td>
104
+ <td><p style="text-align: right">1,504 MB</p></td>
105
+ <td><p style="text-align: right">1,523 MB</p></td>
106
+ <td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task">&#128279;</a></p></td>
107
+ </tr>
108
+ <tr>
109
+ <td><p style="text-align: right">4096</p></td>
110
+ <td><p style="text-align: right">417 tk/s</p></td>
111
+ <td><p style="text-align: right">22 tk/s</p></td>
112
+ <td><p style="text-align: right">7.93 s</p></td>
113
+ <td><p style="text-align: right">3,176 MB</p></td>
114
+ <td><p style="text-align: right">1,875 MB</p></td>
115
+ <td><p style="text-align: right">1,523 MB</p></td>
116
+ <td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.task">&#128279;</a></p></td>
117
  </tr>
118
 
119
  </table>
120
 
121
+ * For the list of supported quantization schemes see [supported-schemes](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/quantize#supported-schemes).
122
+ For these models, we are using prefill signature lengths of 32, 128, 512 and 1280.
123
  * Model Size: measured by the size of the .tflite flatbuffer (serialization
124
  format for LiteRT models)
125
  * Memory: indicator of peak RAM usage
126
  * The inference on CPU is accelerated via the LiteRT
127
  [XNNPACK](https://github.com/google/XNNPACK) delegate with 4 threads
128
+ * Benchmark is run with cache enabled and initialized. During the first run,
129
+ the time to first token may differ.
130
+
notebook.ipynb CHANGED
@@ -11,1043 +11,13 @@
11
  },
12
  "language_info": {
13
  "name": "python"
14
- },
15
- "widgets": {
16
- "application/vnd.jupyter.widget-state+json": {
17
- "47cd47140dbb4e28a4f31d5632bfe82d": {
18
- "model_module": "@jupyter-widgets/controls",
19
- "model_name": "HBoxModel",
20
- "model_module_version": "1.5.0",
21
- "state": {
22
- "_dom_classes": [],
23
- "_model_module": "@jupyter-widgets/controls",
24
- "_model_module_version": "1.5.0",
25
- "_model_name": "HBoxModel",
26
- "_view_count": null,
27
- "_view_module": "@jupyter-widgets/controls",
28
- "_view_module_version": "1.5.0",
29
- "_view_name": "HBoxView",
30
- "box_style": "",
31
- "children": [
32
- "IPY_MODEL_7c0ddb1e0e3145f08ccb0c32b02c562f",
33
- "IPY_MODEL_85c490db972b4d659caad513359a6700",
34
- "IPY_MODEL_d61e96ae08d84414a638dd592f13fb18"
35
- ],
36
- "layout": "IPY_MODEL_9e7f4734aa034e4aa5207b8a2498ee02"
37
- }
38
- },
39
- "7c0ddb1e0e3145f08ccb0c32b02c562f": {
40
- "model_module": "@jupyter-widgets/controls",
41
- "model_name": "HTMLModel",
42
- "model_module_version": "1.5.0",
43
- "state": {
44
- "_dom_classes": [],
45
- "_model_module": "@jupyter-widgets/controls",
46
- "_model_module_version": "1.5.0",
47
- "_model_name": "HTMLModel",
48
- "_view_count": null,
49
- "_view_module": "@jupyter-widgets/controls",
50
- "_view_module_version": "1.5.0",
51
- "_view_name": "HTMLView",
52
- "description": "",
53
- "description_tooltip": null,
54
- "layout": "IPY_MODEL_df08ba8056fb47cb969e132087987e68",
55
- "placeholder": "​",
56
- "style": "IPY_MODEL_470febc3af8348ef8611255e88401229",
57
- "value": "deepseek_q8_seq128_ekv1280.tflite: 100%"
58
- }
59
- },
60
- "85c490db972b4d659caad513359a6700": {
61
- "model_module": "@jupyter-widgets/controls",
62
- "model_name": "FloatProgressModel",
63
- "model_module_version": "1.5.0",
64
- "state": {
65
- "_dom_classes": [],
66
- "_model_module": "@jupyter-widgets/controls",
67
- "_model_module_version": "1.5.0",
68
- "_model_name": "FloatProgressModel",
69
- "_view_count": null,
70
- "_view_module": "@jupyter-widgets/controls",
71
- "_view_module_version": "1.5.0",
72
- "_view_name": "ProgressView",
73
- "bar_style": "success",
74
- "description": "",
75
- "description_tooltip": null,
76
- "layout": "IPY_MODEL_39cedca11f574c01808acdc1be9aa68d",
77
- "max": 1808783640,
78
- "min": 0,
79
- "orientation": "horizontal",
80
- "style": "IPY_MODEL_62bd6d393ca74193bded59a8ebd0a749",
81
- "value": 1808783640
82
- }
83
- },
84
- "d61e96ae08d84414a638dd592f13fb18": {
85
- "model_module": "@jupyter-widgets/controls",
86
- "model_name": "HTMLModel",
87
- "model_module_version": "1.5.0",
88
- "state": {
89
- "_dom_classes": [],
90
- "_model_module": "@jupyter-widgets/controls",
91
- "_model_module_version": "1.5.0",
92
- "_model_name": "HTMLModel",
93
- "_view_count": null,
94
- "_view_module": "@jupyter-widgets/controls",
95
- "_view_module_version": "1.5.0",
96
- "_view_name": "HTMLView",
97
- "description": "",
98
- "description_tooltip": null,
99
- "layout": "IPY_MODEL_475c5c4fc6eb404180d7b69d75f797ea",
100
- "placeholder": "​",
101
- "style": "IPY_MODEL_b815fc17c9ee4913b5cb452653ff1af9",
102
- "value": " 1.81G/1.81G [00:16\u0026lt;00:00, 160MB/s]"
103
- }
104
- },
105
- "9e7f4734aa034e4aa5207b8a2498ee02": {
106
- "model_module": "@jupyter-widgets/base",
107
- "model_name": "LayoutModel",
108
- "model_module_version": "1.2.0",
109
- "state": {
110
- "_model_module": "@jupyter-widgets/base",
111
- "_model_module_version": "1.2.0",
112
- "_model_name": "LayoutModel",
113
- "_view_count": null,
114
- "_view_module": "@jupyter-widgets/base",
115
- "_view_module_version": "1.2.0",
116
- "_view_name": "LayoutView",
117
- "align_content": null,
118
- "align_items": null,
119
- "align_self": null,
120
- "border": null,
121
- "bottom": null,
122
- "display": null,
123
- "flex": null,
124
- "flex_flow": null,
125
- "grid_area": null,
126
- "grid_auto_columns": null,
127
- "grid_auto_flow": null,
128
- "grid_auto_rows": null,
129
- "grid_column": null,
130
- "grid_gap": null,
131
- "grid_row": null,
132
- "grid_template_areas": null,
133
- "grid_template_columns": null,
134
- "grid_template_rows": null,
135
- "height": null,
136
- "justify_content": null,
137
- "justify_items": null,
138
- "left": null,
139
- "margin": null,
140
- "max_height": null,
141
- "max_width": null,
142
- "min_height": null,
143
- "min_width": null,
144
- "object_fit": null,
145
- "object_position": null,
146
- "order": null,
147
- "overflow": null,
148
- "overflow_x": null,
149
- "overflow_y": null,
150
- "padding": null,
151
- "right": null,
152
- "top": null,
153
- "visibility": null,
154
- "width": null
155
- }
156
- },
157
- "df08ba8056fb47cb969e132087987e68": {
158
- "model_module": "@jupyter-widgets/base",
159
- "model_name": "LayoutModel",
160
- "model_module_version": "1.2.0",
161
- "state": {
162
- "_model_module": "@jupyter-widgets/base",
163
- "_model_module_version": "1.2.0",
164
- "_model_name": "LayoutModel",
165
- "_view_count": null,
166
- "_view_module": "@jupyter-widgets/base",
167
- "_view_module_version": "1.2.0",
168
- "_view_name": "LayoutView",
169
- "align_content": null,
170
- "align_items": null,
171
- "align_self": null,
172
- "border": null,
173
- "bottom": null,
174
- "display": null,
175
- "flex": null,
176
- "flex_flow": null,
177
- "grid_area": null,
178
- "grid_auto_columns": null,
179
- "grid_auto_flow": null,
180
- "grid_auto_rows": null,
181
- "grid_column": null,
182
- "grid_gap": null,
183
- "grid_row": null,
184
- "grid_template_areas": null,
185
- "grid_template_columns": null,
186
- "grid_template_rows": null,
187
- "height": null,
188
- "justify_content": null,
189
- "justify_items": null,
190
- "left": null,
191
- "margin": null,
192
- "max_height": null,
193
- "max_width": null,
194
- "min_height": null,
195
- "min_width": null,
196
- "object_fit": null,
197
- "object_position": null,
198
- "order": null,
199
- "overflow": null,
200
- "overflow_x": null,
201
- "overflow_y": null,
202
- "padding": null,
203
- "right": null,
204
- "top": null,
205
- "visibility": null,
206
- "width": null
207
- }
208
- },
209
- "470febc3af8348ef8611255e88401229": {
210
- "model_module": "@jupyter-widgets/controls",
211
- "model_name": "DescriptionStyleModel",
212
- "model_module_version": "1.5.0",
213
- "state": {
214
- "_model_module": "@jupyter-widgets/controls",
215
- "_model_module_version": "1.5.0",
216
- "_model_name": "DescriptionStyleModel",
217
- "_view_count": null,
218
- "_view_module": "@jupyter-widgets/base",
219
- "_view_module_version": "1.2.0",
220
- "_view_name": "StyleView",
221
- "description_width": ""
222
- }
223
- },
224
- "39cedca11f574c01808acdc1be9aa68d": {
225
- "model_module": "@jupyter-widgets/base",
226
- "model_name": "LayoutModel",
227
- "model_module_version": "1.2.0",
228
- "state": {
229
- "_model_module": "@jupyter-widgets/base",
230
- "_model_module_version": "1.2.0",
231
- "_model_name": "LayoutModel",
232
- "_view_count": null,
233
- "_view_module": "@jupyter-widgets/base",
234
- "_view_module_version": "1.2.0",
235
- "_view_name": "LayoutView",
236
- "align_content": null,
237
- "align_items": null,
238
- "align_self": null,
239
- "border": null,
240
- "bottom": null,
241
- "display": null,
242
- "flex": null,
243
- "flex_flow": null,
244
- "grid_area": null,
245
- "grid_auto_columns": null,
246
- "grid_auto_flow": null,
247
- "grid_auto_rows": null,
248
- "grid_column": null,
249
- "grid_gap": null,
250
- "grid_row": null,
251
- "grid_template_areas": null,
252
- "grid_template_columns": null,
253
- "grid_template_rows": null,
254
- "height": null,
255
- "justify_content": null,
256
- "justify_items": null,
257
- "left": null,
258
- "margin": null,
259
- "max_height": null,
260
- "max_width": null,
261
- "min_height": null,
262
- "min_width": null,
263
- "object_fit": null,
264
- "object_position": null,
265
- "order": null,
266
- "overflow": null,
267
- "overflow_x": null,
268
- "overflow_y": null,
269
- "padding": null,
270
- "right": null,
271
- "top": null,
272
- "visibility": null,
273
- "width": null
274
- }
275
- },
276
- "62bd6d393ca74193bded59a8ebd0a749": {
277
- "model_module": "@jupyter-widgets/controls",
278
- "model_name": "ProgressStyleModel",
279
- "model_module_version": "1.5.0",
280
- "state": {
281
- "_model_module": "@jupyter-widgets/controls",
282
- "_model_module_version": "1.5.0",
283
- "_model_name": "ProgressStyleModel",
284
- "_view_count": null,
285
- "_view_module": "@jupyter-widgets/base",
286
- "_view_module_version": "1.2.0",
287
- "_view_name": "StyleView",
288
- "bar_color": null,
289
- "description_width": ""
290
- }
291
- },
292
- "475c5c4fc6eb404180d7b69d75f797ea": {
293
- "model_module": "@jupyter-widgets/base",
294
- "model_name": "LayoutModel",
295
- "model_module_version": "1.2.0",
296
- "state": {
297
- "_model_module": "@jupyter-widgets/base",
298
- "_model_module_version": "1.2.0",
299
- "_model_name": "LayoutModel",
300
- "_view_count": null,
301
- "_view_module": "@jupyter-widgets/base",
302
- "_view_module_version": "1.2.0",
303
- "_view_name": "LayoutView",
304
- "align_content": null,
305
- "align_items": null,
306
- "align_self": null,
307
- "border": null,
308
- "bottom": null,
309
- "display": null,
310
- "flex": null,
311
- "flex_flow": null,
312
- "grid_area": null,
313
- "grid_auto_columns": null,
314
- "grid_auto_flow": null,
315
- "grid_auto_rows": null,
316
- "grid_column": null,
317
- "grid_gap": null,
318
- "grid_row": null,
319
- "grid_template_areas": null,
320
- "grid_template_columns": null,
321
- "grid_template_rows": null,
322
- "height": null,
323
- "justify_content": null,
324
- "justify_items": null,
325
- "left": null,
326
- "margin": null,
327
- "max_height": null,
328
- "max_width": null,
329
- "min_height": null,
330
- "min_width": null,
331
- "object_fit": null,
332
- "object_position": null,
333
- "order": null,
334
- "overflow": null,
335
- "overflow_x": null,
336
- "overflow_y": null,
337
- "padding": null,
338
- "right": null,
339
- "top": null,
340
- "visibility": null,
341
- "width": null
342
- }
343
- },
344
- "b815fc17c9ee4913b5cb452653ff1af9": {
345
- "model_module": "@jupyter-widgets/controls",
346
- "model_name": "DescriptionStyleModel",
347
- "model_module_version": "1.5.0",
348
- "state": {
349
- "_model_module": "@jupyter-widgets/controls",
350
- "_model_module_version": "1.5.0",
351
- "_model_name": "DescriptionStyleModel",
352
- "_view_count": null,
353
- "_view_module": "@jupyter-widgets/base",
354
- "_view_module_version": "1.2.0",
355
- "_view_name": "StyleView",
356
- "description_width": ""
357
- }
358
- },
359
- "8cac4d03da1044d6adb8b62752ed6775": {
360
- "model_module": "@jupyter-widgets/controls",
361
- "model_name": "HBoxModel",
362
- "model_module_version": "1.5.0",
363
- "state": {
364
- "_dom_classes": [],
365
- "_model_module": "@jupyter-widgets/controls",
366
- "_model_module_version": "1.5.0",
367
- "_model_name": "HBoxModel",
368
- "_view_count": null,
369
- "_view_module": "@jupyter-widgets/controls",
370
- "_view_module_version": "1.5.0",
371
- "_view_name": "HBoxView",
372
- "box_style": "",
373
- "children": [
374
- "IPY_MODEL_a201091e2f9b4f6c8a7d780dde854134",
375
- "IPY_MODEL_16e2c22fb42e41e8b810c4e659091d37",
376
- "IPY_MODEL_a1f5e814104646cbac5db19fdbcfccb2"
377
- ],
378
- "layout": "IPY_MODEL_3186fb1553884a7da72a387f1e00eca5"
379
- }
380
- },
381
- "a201091e2f9b4f6c8a7d780dde854134": {
382
- "model_module": "@jupyter-widgets/controls",
383
- "model_name": "HTMLModel",
384
- "model_module_version": "1.5.0",
385
- "state": {
386
- "_dom_classes": [],
387
- "_model_module": "@jupyter-widgets/controls",
388
- "_model_module_version": "1.5.0",
389
- "_model_name": "HTMLModel",
390
- "_view_count": null,
391
- "_view_module": "@jupyter-widgets/controls",
392
- "_view_module_version": "1.5.0",
393
- "_view_name": "HTMLView",
394
- "description": "",
395
- "description_tooltip": null,
396
- "layout": "IPY_MODEL_875fbcb976bf486092d3c6f483b9e042",
397
- "placeholder": "​",
398
- "style": "IPY_MODEL_e2a24c0c90b149508715998b1cf301f7",
399
- "value": "tokenizer_config.json: 100%"
400
- }
401
- },
402
- "16e2c22fb42e41e8b810c4e659091d37": {
403
- "model_module": "@jupyter-widgets/controls",
404
- "model_name": "FloatProgressModel",
405
- "model_module_version": "1.5.0",
406
- "state": {
407
- "_dom_classes": [],
408
- "_model_module": "@jupyter-widgets/controls",
409
- "_model_module_version": "1.5.0",
410
- "_model_name": "FloatProgressModel",
411
- "_view_count": null,
412
- "_view_module": "@jupyter-widgets/controls",
413
- "_view_module_version": "1.5.0",
414
- "_view_name": "ProgressView",
415
- "bar_style": "success",
416
- "description": "",
417
- "description_tooltip": null,
418
- "layout": "IPY_MODEL_c730ecd68ae547b1822039b86bd22322",
419
- "max": 3071,
420
- "min": 0,
421
- "orientation": "horizontal",
422
- "style": "IPY_MODEL_0cd73c61a5e04ae1854eb1f1c4d92317",
423
- "value": 3071
424
- }
425
- },
426
- "a1f5e814104646cbac5db19fdbcfccb2": {
427
- "model_module": "@jupyter-widgets/controls",
428
- "model_name": "HTMLModel",
429
- "model_module_version": "1.5.0",
430
- "state": {
431
- "_dom_classes": [],
432
- "_model_module": "@jupyter-widgets/controls",
433
- "_model_module_version": "1.5.0",
434
- "_model_name": "HTMLModel",
435
- "_view_count": null,
436
- "_view_module": "@jupyter-widgets/controls",
437
- "_view_module_version": "1.5.0",
438
- "_view_name": "HTMLView",
439
- "description": "",
440
- "description_tooltip": null,
441
- "layout": "IPY_MODEL_c46a9a3e8c7d4560ae71226920e17acd",
442
- "placeholder": "​",
443
- "style": "IPY_MODEL_2303aed14ff44e178ed20edf1f2e5359",
444
- "value": " 3.07k/3.07k [00:00\u0026lt;00:00, 267kB/s]"
445
- }
446
- },
447
- "3186fb1553884a7da72a387f1e00eca5": {
448
- "model_module": "@jupyter-widgets/base",
449
- "model_name": "LayoutModel",
450
- "model_module_version": "1.2.0",
451
- "state": {
452
- "_model_module": "@jupyter-widgets/base",
453
- "_model_module_version": "1.2.0",
454
- "_model_name": "LayoutModel",
455
- "_view_count": null,
456
- "_view_module": "@jupyter-widgets/base",
457
- "_view_module_version": "1.2.0",
458
- "_view_name": "LayoutView",
459
- "align_content": null,
460
- "align_items": null,
461
- "align_self": null,
462
- "border": null,
463
- "bottom": null,
464
- "display": null,
465
- "flex": null,
466
- "flex_flow": null,
467
- "grid_area": null,
468
- "grid_auto_columns": null,
469
- "grid_auto_flow": null,
470
- "grid_auto_rows": null,
471
- "grid_column": null,
472
- "grid_gap": null,
473
- "grid_row": null,
474
- "grid_template_areas": null,
475
- "grid_template_columns": null,
476
- "grid_template_rows": null,
477
- "height": null,
478
- "justify_content": null,
479
- "justify_items": null,
480
- "left": null,
481
- "margin": null,
482
- "max_height": null,
483
- "max_width": null,
484
- "min_height": null,
485
- "min_width": null,
486
- "object_fit": null,
487
- "object_position": null,
488
- "order": null,
489
- "overflow": null,
490
- "overflow_x": null,
491
- "overflow_y": null,
492
- "padding": null,
493
- "right": null,
494
- "top": null,
495
- "visibility": null,
496
- "width": null
497
- }
498
- },
499
- "875fbcb976bf486092d3c6f483b9e042": {
500
- "model_module": "@jupyter-widgets/base",
501
- "model_name": "LayoutModel",
502
- "model_module_version": "1.2.0",
503
- "state": {
504
- "_model_module": "@jupyter-widgets/base",
505
- "_model_module_version": "1.2.0",
506
- "_model_name": "LayoutModel",
507
- "_view_count": null,
508
- "_view_module": "@jupyter-widgets/base",
509
- "_view_module_version": "1.2.0",
510
- "_view_name": "LayoutView",
511
- "align_content": null,
512
- "align_items": null,
513
- "align_self": null,
514
- "border": null,
515
- "bottom": null,
516
- "display": null,
517
- "flex": null,
518
- "flex_flow": null,
519
- "grid_area": null,
520
- "grid_auto_columns": null,
521
- "grid_auto_flow": null,
522
- "grid_auto_rows": null,
523
- "grid_column": null,
524
- "grid_gap": null,
525
- "grid_row": null,
526
- "grid_template_areas": null,
527
- "grid_template_columns": null,
528
- "grid_template_rows": null,
529
- "height": null,
530
- "justify_content": null,
531
- "justify_items": null,
532
- "left": null,
533
- "margin": null,
534
- "max_height": null,
535
- "max_width": null,
536
- "min_height": null,
537
- "min_width": null,
538
- "object_fit": null,
539
- "object_position": null,
540
- "order": null,
541
- "overflow": null,
542
- "overflow_x": null,
543
- "overflow_y": null,
544
- "padding": null,
545
- "right": null,
546
- "top": null,
547
- "visibility": null,
548
- "width": null
549
- }
550
- },
551
- "e2a24c0c90b149508715998b1cf301f7": {
552
- "model_module": "@jupyter-widgets/controls",
553
- "model_name": "DescriptionStyleModel",
554
- "model_module_version": "1.5.0",
555
- "state": {
556
- "_model_module": "@jupyter-widgets/controls",
557
- "_model_module_version": "1.5.0",
558
- "_model_name": "DescriptionStyleModel",
559
- "_view_count": null,
560
- "_view_module": "@jupyter-widgets/base",
561
- "_view_module_version": "1.2.0",
562
- "_view_name": "StyleView",
563
- "description_width": ""
564
- }
565
- },
566
- "c730ecd68ae547b1822039b86bd22322": {
567
- "model_module": "@jupyter-widgets/base",
568
- "model_name": "LayoutModel",
569
- "model_module_version": "1.2.0",
570
- "state": {
571
- "_model_module": "@jupyter-widgets/base",
572
- "_model_module_version": "1.2.0",
573
- "_model_name": "LayoutModel",
574
- "_view_count": null,
575
- "_view_module": "@jupyter-widgets/base",
576
- "_view_module_version": "1.2.0",
577
- "_view_name": "LayoutView",
578
- "align_content": null,
579
- "align_items": null,
580
- "align_self": null,
581
- "border": null,
582
- "bottom": null,
583
- "display": null,
584
- "flex": null,
585
- "flex_flow": null,
586
- "grid_area": null,
587
- "grid_auto_columns": null,
588
- "grid_auto_flow": null,
589
- "grid_auto_rows": null,
590
- "grid_column": null,
591
- "grid_gap": null,
592
- "grid_row": null,
593
- "grid_template_areas": null,
594
- "grid_template_columns": null,
595
- "grid_template_rows": null,
596
- "height": null,
597
- "justify_content": null,
598
- "justify_items": null,
599
- "left": null,
600
- "margin": null,
601
- "max_height": null,
602
- "max_width": null,
603
- "min_height": null,
604
- "min_width": null,
605
- "object_fit": null,
606
- "object_position": null,
607
- "order": null,
608
- "overflow": null,
609
- "overflow_x": null,
610
- "overflow_y": null,
611
- "padding": null,
612
- "right": null,
613
- "top": null,
614
- "visibility": null,
615
- "width": null
616
- }
617
- },
618
- "0cd73c61a5e04ae1854eb1f1c4d92317": {
619
- "model_module": "@jupyter-widgets/controls",
620
- "model_name": "ProgressStyleModel",
621
- "model_module_version": "1.5.0",
622
- "state": {
623
- "_model_module": "@jupyter-widgets/controls",
624
- "_model_module_version": "1.5.0",
625
- "_model_name": "ProgressStyleModel",
626
- "_view_count": null,
627
- "_view_module": "@jupyter-widgets/base",
628
- "_view_module_version": "1.2.0",
629
- "_view_name": "StyleView",
630
- "bar_color": null,
631
- "description_width": ""
632
- }
633
- },
634
- "c46a9a3e8c7d4560ae71226920e17acd": {
635
- "model_module": "@jupyter-widgets/base",
636
- "model_name": "LayoutModel",
637
- "model_module_version": "1.2.0",
638
- "state": {
639
- "_model_module": "@jupyter-widgets/base",
640
- "_model_module_version": "1.2.0",
641
- "_model_name": "LayoutModel",
642
- "_view_count": null,
643
- "_view_module": "@jupyter-widgets/base",
644
- "_view_module_version": "1.2.0",
645
- "_view_name": "LayoutView",
646
- "align_content": null,
647
- "align_items": null,
648
- "align_self": null,
649
- "border": null,
650
- "bottom": null,
651
- "display": null,
652
- "flex": null,
653
- "flex_flow": null,
654
- "grid_area": null,
655
- "grid_auto_columns": null,
656
- "grid_auto_flow": null,
657
- "grid_auto_rows": null,
658
- "grid_column": null,
659
- "grid_gap": null,
660
- "grid_row": null,
661
- "grid_template_areas": null,
662
- "grid_template_columns": null,
663
- "grid_template_rows": null,
664
- "height": null,
665
- "justify_content": null,
666
- "justify_items": null,
667
- "left": null,
668
- "margin": null,
669
- "max_height": null,
670
- "max_width": null,
671
- "min_height": null,
672
- "min_width": null,
673
- "object_fit": null,
674
- "object_position": null,
675
- "order": null,
676
- "overflow": null,
677
- "overflow_x": null,
678
- "overflow_y": null,
679
- "padding": null,
680
- "right": null,
681
- "top": null,
682
- "visibility": null,
683
- "width": null
684
- }
685
- },
686
- "2303aed14ff44e178ed20edf1f2e5359": {
687
- "model_module": "@jupyter-widgets/controls",
688
- "model_name": "DescriptionStyleModel",
689
- "model_module_version": "1.5.0",
690
- "state": {
691
- "_model_module": "@jupyter-widgets/controls",
692
- "_model_module_version": "1.5.0",
693
- "_model_name": "DescriptionStyleModel",
694
- "_view_count": null,
695
- "_view_module": "@jupyter-widgets/base",
696
- "_view_module_version": "1.2.0",
697
- "_view_name": "StyleView",
698
- "description_width": ""
699
- }
700
- },
701
- "072e1baca7d64766807df5454dc9e3cc": {
702
- "model_module": "@jupyter-widgets/controls",
703
- "model_name": "HBoxModel",
704
- "model_module_version": "1.5.0",
705
- "state": {
706
- "_dom_classes": [],
707
- "_model_module": "@jupyter-widgets/controls",
708
- "_model_module_version": "1.5.0",
709
- "_model_name": "HBoxModel",
710
- "_view_count": null,
711
- "_view_module": "@jupyter-widgets/controls",
712
- "_view_module_version": "1.5.0",
713
- "_view_name": "HBoxView",
714
- "box_style": "",
715
- "children": [
716
- "IPY_MODEL_6da37a13974c4c3890c7676d194021bc",
717
- "IPY_MODEL_2f5b6f1af091405287c35c53ad169354",
718
- "IPY_MODEL_b977fb3e42a14fe1bec47426ae1efded"
719
- ],
720
- "layout": "IPY_MODEL_a063adb2cc1c44438d5f631fb16297ae"
721
- }
722
- },
723
- "6da37a13974c4c3890c7676d194021bc": {
724
- "model_module": "@jupyter-widgets/controls",
725
- "model_name": "HTMLModel",
726
- "model_module_version": "1.5.0",
727
- "state": {
728
- "_dom_classes": [],
729
- "_model_module": "@jupyter-widgets/controls",
730
- "_model_module_version": "1.5.0",
731
- "_model_name": "HTMLModel",
732
- "_view_count": null,
733
- "_view_module": "@jupyter-widgets/controls",
734
- "_view_module_version": "1.5.0",
735
- "_view_name": "HTMLView",
736
- "description": "",
737
- "description_tooltip": null,
738
- "layout": "IPY_MODEL_50f86e2ac8444d1986d8d9afe9fcee37",
739
- "placeholder": "​",
740
- "style": "IPY_MODEL_da323d8a744a43d8901f19c48b1e1223",
741
- "value": "tokenizer.json: 100%"
742
- }
743
- },
744
- "2f5b6f1af091405287c35c53ad169354": {
745
- "model_module": "@jupyter-widgets/controls",
746
- "model_name": "FloatProgressModel",
747
- "model_module_version": "1.5.0",
748
- "state": {
749
- "_dom_classes": [],
750
- "_model_module": "@jupyter-widgets/controls",
751
- "_model_module_version": "1.5.0",
752
- "_model_name": "FloatProgressModel",
753
- "_view_count": null,
754
- "_view_module": "@jupyter-widgets/controls",
755
- "_view_module_version": "1.5.0",
756
- "_view_name": "ProgressView",
757
- "bar_style": "success",
758
- "description": "",
759
- "description_tooltip": null,
760
- "layout": "IPY_MODEL_69afe592335b4d73b51b63e4c56407fc",
761
- "max": 7031660,
762
- "min": 0,
763
- "orientation": "horizontal",
764
- "style": "IPY_MODEL_f3605ab95cbf4ebda9a678a0788e9682",
765
- "value": 7031660
766
- }
767
- },
768
- "b977fb3e42a14fe1bec47426ae1efded": {
769
- "model_module": "@jupyter-widgets/controls",
770
- "model_name": "HTMLModel",
771
- "model_module_version": "1.5.0",
772
- "state": {
773
- "_dom_classes": [],
774
- "_model_module": "@jupyter-widgets/controls",
775
- "_model_module_version": "1.5.0",
776
- "_model_name": "HTMLModel",
777
- "_view_count": null,
778
- "_view_module": "@jupyter-widgets/controls",
779
- "_view_module_version": "1.5.0",
780
- "_view_name": "HTMLView",
781
- "description": "",
782
- "description_tooltip": null,
783
- "layout": "IPY_MODEL_7d2023b2a9054a3991983a30fdc6555b",
784
- "placeholder": "​",
785
- "style": "IPY_MODEL_17d028b387724317ae9994819a97a3a4",
786
- "value": " 7.03M/7.03M [00:00\u0026lt;00:00, 28.7MB/s]"
787
- }
788
- },
789
- "a063adb2cc1c44438d5f631fb16297ae": {
790
- "model_module": "@jupyter-widgets/base",
791
- "model_name": "LayoutModel",
792
- "model_module_version": "1.2.0",
793
- "state": {
794
- "_model_module": "@jupyter-widgets/base",
795
- "_model_module_version": "1.2.0",
796
- "_model_name": "LayoutModel",
797
- "_view_count": null,
798
- "_view_module": "@jupyter-widgets/base",
799
- "_view_module_version": "1.2.0",
800
- "_view_name": "LayoutView",
801
- "align_content": null,
802
- "align_items": null,
803
- "align_self": null,
804
- "border": null,
805
- "bottom": null,
806
- "display": null,
807
- "flex": null,
808
- "flex_flow": null,
809
- "grid_area": null,
810
- "grid_auto_columns": null,
811
- "grid_auto_flow": null,
812
- "grid_auto_rows": null,
813
- "grid_column": null,
814
- "grid_gap": null,
815
- "grid_row": null,
816
- "grid_template_areas": null,
817
- "grid_template_columns": null,
818
- "grid_template_rows": null,
819
- "height": null,
820
- "justify_content": null,
821
- "justify_items": null,
822
- "left": null,
823
- "margin": null,
824
- "max_height": null,
825
- "max_width": null,
826
- "min_height": null,
827
- "min_width": null,
828
- "object_fit": null,
829
- "object_position": null,
830
- "order": null,
831
- "overflow": null,
832
- "overflow_x": null,
833
- "overflow_y": null,
834
- "padding": null,
835
- "right": null,
836
- "top": null,
837
- "visibility": null,
838
- "width": null
839
- }
840
- },
841
- "50f86e2ac8444d1986d8d9afe9fcee37": {
842
- "model_module": "@jupyter-widgets/base",
843
- "model_name": "LayoutModel",
844
- "model_module_version": "1.2.0",
845
- "state": {
846
- "_model_module": "@jupyter-widgets/base",
847
- "_model_module_version": "1.2.0",
848
- "_model_name": "LayoutModel",
849
- "_view_count": null,
850
- "_view_module": "@jupyter-widgets/base",
851
- "_view_module_version": "1.2.0",
852
- "_view_name": "LayoutView",
853
- "align_content": null,
854
- "align_items": null,
855
- "align_self": null,
856
- "border": null,
857
- "bottom": null,
858
- "display": null,
859
- "flex": null,
860
- "flex_flow": null,
861
- "grid_area": null,
862
- "grid_auto_columns": null,
863
- "grid_auto_flow": null,
864
- "grid_auto_rows": null,
865
- "grid_column": null,
866
- "grid_gap": null,
867
- "grid_row": null,
868
- "grid_template_areas": null,
869
- "grid_template_columns": null,
870
- "grid_template_rows": null,
871
- "height": null,
872
- "justify_content": null,
873
- "justify_items": null,
874
- "left": null,
875
- "margin": null,
876
- "max_height": null,
877
- "max_width": null,
878
- "min_height": null,
879
- "min_width": null,
880
- "object_fit": null,
881
- "object_position": null,
882
- "order": null,
883
- "overflow": null,
884
- "overflow_x": null,
885
- "overflow_y": null,
886
- "padding": null,
887
- "right": null,
888
- "top": null,
889
- "visibility": null,
890
- "width": null
891
- }
892
- },
893
- "da323d8a744a43d8901f19c48b1e1223": {
894
- "model_module": "@jupyter-widgets/controls",
895
- "model_name": "DescriptionStyleModel",
896
- "model_module_version": "1.5.0",
897
- "state": {
898
- "_model_module": "@jupyter-widgets/controls",
899
- "_model_module_version": "1.5.0",
900
- "_model_name": "DescriptionStyleModel",
901
- "_view_count": null,
902
- "_view_module": "@jupyter-widgets/base",
903
- "_view_module_version": "1.2.0",
904
- "_view_name": "StyleView",
905
- "description_width": ""
906
- }
907
- },
908
- "69afe592335b4d73b51b63e4c56407fc": {
909
- "model_module": "@jupyter-widgets/base",
910
- "model_name": "LayoutModel",
911
- "model_module_version": "1.2.0",
912
- "state": {
913
- "_model_module": "@jupyter-widgets/base",
914
- "_model_module_version": "1.2.0",
915
- "_model_name": "LayoutModel",
916
- "_view_count": null,
917
- "_view_module": "@jupyter-widgets/base",
918
- "_view_module_version": "1.2.0",
919
- "_view_name": "LayoutView",
920
- "align_content": null,
921
- "align_items": null,
922
- "align_self": null,
923
- "border": null,
924
- "bottom": null,
925
- "display": null,
926
- "flex": null,
927
- "flex_flow": null,
928
- "grid_area": null,
929
- "grid_auto_columns": null,
930
- "grid_auto_flow": null,
931
- "grid_auto_rows": null,
932
- "grid_column": null,
933
- "grid_gap": null,
934
- "grid_row": null,
935
- "grid_template_areas": null,
936
- "grid_template_columns": null,
937
- "grid_template_rows": null,
938
- "height": null,
939
- "justify_content": null,
940
- "justify_items": null,
941
- "left": null,
942
- "margin": null,
943
- "max_height": null,
944
- "max_width": null,
945
- "min_height": null,
946
- "min_width": null,
947
- "object_fit": null,
948
- "object_position": null,
949
- "order": null,
950
- "overflow": null,
951
- "overflow_x": null,
952
- "overflow_y": null,
953
- "padding": null,
954
- "right": null,
955
- "top": null,
956
- "visibility": null,
957
- "width": null
958
- }
959
- },
960
- "f3605ab95cbf4ebda9a678a0788e9682": {
961
- "model_module": "@jupyter-widgets/controls",
962
- "model_name": "ProgressStyleModel",
963
- "model_module_version": "1.5.0",
964
- "state": {
965
- "_model_module": "@jupyter-widgets/controls",
966
- "_model_module_version": "1.5.0",
967
- "_model_name": "ProgressStyleModel",
968
- "_view_count": null,
969
- "_view_module": "@jupyter-widgets/base",
970
- "_view_module_version": "1.2.0",
971
- "_view_name": "StyleView",
972
- "bar_color": null,
973
- "description_width": ""
974
- }
975
- },
976
- "7d2023b2a9054a3991983a30fdc6555b": {
977
- "model_module": "@jupyter-widgets/base",
978
- "model_name": "LayoutModel",
979
- "model_module_version": "1.2.0",
980
- "state": {
981
- "_model_module": "@jupyter-widgets/base",
982
- "_model_module_version": "1.2.0",
983
- "_model_name": "LayoutModel",
984
- "_view_count": null,
985
- "_view_module": "@jupyter-widgets/base",
986
- "_view_module_version": "1.2.0",
987
- "_view_name": "LayoutView",
988
- "align_content": null,
989
- "align_items": null,
990
- "align_self": null,
991
- "border": null,
992
- "bottom": null,
993
- "display": null,
994
- "flex": null,
995
- "flex_flow": null,
996
- "grid_area": null,
997
- "grid_auto_columns": null,
998
- "grid_auto_flow": null,
999
- "grid_auto_rows": null,
1000
- "grid_column": null,
1001
- "grid_gap": null,
1002
- "grid_row": null,
1003
- "grid_template_areas": null,
1004
- "grid_template_columns": null,
1005
- "grid_template_rows": null,
1006
- "height": null,
1007
- "justify_content": null,
1008
- "justify_items": null,
1009
- "left": null,
1010
- "margin": null,
1011
- "max_height": null,
1012
- "max_width": null,
1013
- "min_height": null,
1014
- "min_width": null,
1015
- "object_fit": null,
1016
- "object_position": null,
1017
- "order": null,
1018
- "overflow": null,
1019
- "overflow_x": null,
1020
- "overflow_y": null,
1021
- "padding": null,
1022
- "right": null,
1023
- "top": null,
1024
- "visibility": null,
1025
- "width": null
1026
- }
1027
- },
1028
- "17d028b387724317ae9994819a97a3a4": {
1029
- "model_module": "@jupyter-widgets/controls",
1030
- "model_name": "DescriptionStyleModel",
1031
- "model_module_version": "1.5.0",
1032
- "state": {
1033
- "_model_module": "@jupyter-widgets/controls",
1034
- "_model_module_version": "1.5.0",
1035
- "_model_name": "DescriptionStyleModel",
1036
- "_view_count": null,
1037
- "_view_module": "@jupyter-widgets/base",
1038
- "_view_module_version": "1.2.0",
1039
- "_view_name": "StyleView",
1040
- "description_width": ""
1041
- }
1042
- }
1043
- }
1044
  }
1045
  },
1046
  "cells": [
1047
  {
1048
  "cell_type": "markdown",
1049
  "source": [
1050
- "#Install dependencies"
1051
  ],
1052
  "metadata": {
1053
  "id": "39AMoCOa1ckc"
@@ -1057,373 +27,53 @@
1057
  "metadata": {
1058
  "id": "VoHxuLPu7s37"
1059
  },
1060
- "cell_type": "code",
1061
- "source": [],
1062
- "outputs": [],
1063
- "execution_count": null
1064
- },
1065
- {
1066
- "cell_type": "code",
1067
- "source": [
1068
- "!pip install ai-edge-litert"
1069
- ],
1070
- "metadata": {
1071
- "id": "43tAeO0AZ7zp",
1072
- "colab": {
1073
- "base_uri": "https://localhost:8080/"
1074
- },
1075
- "outputId": "76cd0d1b-7de2-4519-c0ae-1b9e6ee37653"
1076
- },
1077
- "execution_count": 1,
1078
- "outputs": [
1079
- {
1080
- "output_type": "stream",
1081
- "name": "stdout",
1082
- "text": []
1083
- }
1084
- ]
1085
- },
1086
- {
1087
- "cell_type": "code",
1088
- "source": [
1089
- "from collections.abc import Sequence\n",
1090
- "import sys\n",
1091
- "from ai_edge_litert import interpreter as interpreter_lib\n",
1092
- "import numpy as np\n",
1093
- "from transformers import AutoTokenizer"
1094
- ],
1095
- "metadata": {
1096
- "id": "i6PMkMVBPr1p"
1097
- },
1098
- "execution_count": 2,
1099
- "outputs": []
1100
- },
1101
- {
1102
- "cell_type": "markdown",
1103
- "source": [
1104
- "# Download model files"
1105
- ],
1106
- "metadata": {
1107
- "id": "K5okZCTgYpUd"
1108
- }
1109
- },
1110
- {
1111
  "cell_type": "code",
1112
  "source": [
1113
- "from huggingface_hub import hf_hub_download\n",
1114
- "\n",
1115
- "model_path = hf_hub_download(\n",
1116
- " repo_id=\"litert-community/Qwen2.5-1.5B-Instruct\",\n",
1117
- " filename=\"Qwen2.5-1.5B-Instruct_seq128_q8_ekv1280.tflite\",\n",
1118
- ")"
1119
  ],
1120
- "metadata": {
1121
- "id": "3t47HAG2tvc3",
1122
- "colab": {
1123
- "base_uri": "https://localhost:8080/",
1124
- "height": 49,
1125
- "referenced_widgets": [
1126
- "47cd47140dbb4e28a4f31d5632bfe82d",
1127
- "7c0ddb1e0e3145f08ccb0c32b02c562f",
1128
- "85c490db972b4d659caad513359a6700",
1129
- "d61e96ae08d84414a638dd592f13fb18",
1130
- "9e7f4734aa034e4aa5207b8a2498ee02",
1131
- "df08ba8056fb47cb969e132087987e68",
1132
- "470febc3af8348ef8611255e88401229",
1133
- "39cedca11f574c01808acdc1be9aa68d",
1134
- "62bd6d393ca74193bded59a8ebd0a749",
1135
- "475c5c4fc6eb404180d7b69d75f797ea",
1136
- "b815fc17c9ee4913b5cb452653ff1af9"
1137
- ]
1138
- },
1139
- "outputId": "d1d8ed1a-5ec6-4121-9d3c-fada487fc8ed"
1140
- },
1141
- "execution_count": 3,
1142
- "outputs": []
1143
  },
1144
  {
1145
  "cell_type": "markdown",
1146
  "source": [
1147
- "# Create LiteRT interpreter and tokenizer"
1148
  ],
1149
  "metadata": {
1150
- "id": "n5Xa4s6XhWqk"
1151
  }
1152
  },
1153
  {
1154
  "cell_type": "code",
1155
  "source": [
1156
- "interpreter = interpreter_lib.InterpreterWithCustomOps(\n",
1157
- " custom_op_registerers=[\"pywrap_genai_ops.GenAIOpsRegisterer\"],\n",
1158
- " model_path=model_path,\n",
1159
- " num_threads=2,\n",
1160
- " experimental_default_delegate_latest_features=True,\n",
1161
- ")\n",
1162
- "tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-1.5B-Instruct\")"
1163
  ],
1164
  "metadata": {
1165
- "id": "Rvdn3EIZhaQn",
1166
- "colab": {
1167
- "base_uri": "https://localhost:8080/",
1168
- "height": 81,
1169
- "referenced_widgets": [
1170
- "8cac4d03da1044d6adb8b62752ed6775",
1171
- "a201091e2f9b4f6c8a7d780dde854134",
1172
- "16e2c22fb42e41e8b810c4e659091d37",
1173
- "a1f5e814104646cbac5db19fdbcfccb2",
1174
- "3186fb1553884a7da72a387f1e00eca5",
1175
- "875fbcb976bf486092d3c6f483b9e042",
1176
- "e2a24c0c90b149508715998b1cf301f7",
1177
- "c730ecd68ae547b1822039b86bd22322",
1178
- "0cd73c61a5e04ae1854eb1f1c4d92317",
1179
- "c46a9a3e8c7d4560ae71226920e17acd",
1180
- "2303aed14ff44e178ed20edf1f2e5359",
1181
- "072e1baca7d64766807df5454dc9e3cc",
1182
- "6da37a13974c4c3890c7676d194021bc",
1183
- "2f5b6f1af091405287c35c53ad169354",
1184
- "b977fb3e42a14fe1bec47426ae1efded",
1185
- "a063adb2cc1c44438d5f631fb16297ae",
1186
- "50f86e2ac8444d1986d8d9afe9fcee37",
1187
- "da323d8a744a43d8901f19c48b1e1223",
1188
- "69afe592335b4d73b51b63e4c56407fc",
1189
- "f3605ab95cbf4ebda9a678a0788e9682",
1190
- "7d2023b2a9054a3991983a30fdc6555b",
1191
- "17d028b387724317ae9994819a97a3a4"
1192
- ]
1193
- },
1194
- "outputId": "e05a5944-5312-41c4-e38e-7e26a921e63c"
1195
  },
1196
- "execution_count": 4,
1197
  "outputs": []
1198
  },
1199
  {
1200
  "cell_type": "markdown",
1201
  "source": [
1202
- "# Create pipeline with LiteRT models"
1203
  ],
1204
  "metadata": {
1205
- "id": "AM6rDABTXt2F"
1206
  }
1207
  },
1208
  {
1209
  "cell_type": "code",
1210
  "source": [
1211
- "class LiteRTLlmPipeline:\n",
1212
- "\n",
1213
- " def __init__(self, interpreter, tokenizer):\n",
1214
- " \"\"\"Initializes the pipeline.\"\"\"\n",
1215
- " self._interpreter = interpreter\n",
1216
- " self._tokenizer = tokenizer\n",
1217
- "\n",
1218
- " self._prefill_runner = None\n",
1219
- " self._decode_runner = self._interpreter.get_signature_runner(\"decode\")\n",
1220
- "\n",
1221
- " def _init_prefill_runner(self, num_input_tokens: int):\n",
1222
- " \"\"\"Initializes all the variables related to the prefill runner.\n",
1223
- "\n",
1224
- " This method initializes the following variables:\n",
1225
- " - self._prefill_runner: The prefill runner based on the input size.\n",
1226
- " - self._max_seq_len: The maximum sequence length supported by the model.\n",
1227
- " - self._max_kv_cache_seq_len: The maximum sequence length supported by the\n",
1228
- " KV cache.\n",
1229
- "\n",
1230
- " Args:\n",
1231
- " num_input_tokens: The number of input tokens.\n",
1232
- " \"\"\"\n",
1233
- " if not self._interpreter:\n",
1234
- " raise ValueError(\"Interpreter is not initialized.\")\n",
1235
- "\n",
1236
- " # Prefill runner related variables will be initialized in `predict_text` and\n",
1237
- " # `compute_log_likelihood`.\n",
1238
- " self._prefill_runner = self._get_prefill_runner(num_input_tokens)\n",
1239
- " # input_token_shape has shape (batch, max_seq_len)\n",
1240
- " input_token_shape = self._prefill_runner.get_input_details()[\"tokens\"][\n",
1241
- " \"shape\"\n",
1242
- " ]\n",
1243
- " if len(input_token_shape) == 1:\n",
1244
- " self._max_seq_len = input_token_shape[0]\n",
1245
- " else:\n",
1246
- " self._max_seq_len = input_token_shape[1]\n",
1247
- "\n",
1248
- " # kv cache input has shape [batch=1, seq_len, num_heads, dim].\n",
1249
- " kv_cache_shape = self._prefill_runner.get_input_details()[\"kv_cache_k_0\"][\n",
1250
- " \"shape\"\n",
1251
- " ]\n",
1252
- " self._max_kv_cache_seq_len = kv_cache_shape[1]\n",
1253
- "\n",
1254
- " def _init_kv_cache(self) -\u003e dict[str, np.ndarray]:\n",
1255
- " if self._prefill_runner is None:\n",
1256
- " raise ValueError(\"Prefill runner is not initialized.\")\n",
1257
- " kv_cache = {}\n",
1258
- " for input_key in self._prefill_runner.get_input_details().keys():\n",
1259
- " if \"kv_cache\" in input_key:\n",
1260
- " kv_cache[input_key] = np.zeros(\n",
1261
- " self._prefill_runner.get_input_details()[input_key][\"shape\"],\n",
1262
- " dtype=np.float32,\n",
1263
- " )\n",
1264
- " kv_cache[input_key] = np.zeros(\n",
1265
- " self._prefill_runner.get_input_details()[input_key][\"shape\"],\n",
1266
- " dtype=np.float32,\n",
1267
- " )\n",
1268
- " return kv_cache\n",
1269
- "\n",
1270
- " def _get_prefill_runner(self, num_input_tokens: int):\n",
1271
- " \"\"\"Gets the prefill runner with the best suitable input size.\n",
1272
- "\n",
1273
- " Args:\n",
1274
- " num_input_tokens: The number of input tokens.\n",
1275
- "\n",
1276
- " Returns:\n",
1277
- " The prefill runner with the smallest input size.\n",
1278
- " \"\"\"\n",
1279
- " best_signature = None\n",
1280
- " delta = sys.maxsize\n",
1281
- " max_prefill_len = -1\n",
1282
- " for key in self._interpreter.get_signature_list().keys():\n",
1283
- " if \"prefill\" not in key:\n",
1284
- " continue\n",
1285
- " input_pos = self._interpreter.get_signature_runner(\n",
1286
- " key\n",
1287
- " ).get_input_details()[\"input_pos\"]\n",
1288
- " # input_pos[\"shape\"] has shape (max_seq_len, )\n",
1289
- " seq_size = input_pos[\"shape\"][0]\n",
1290
- " max_prefill_len = max(max_prefill_len, seq_size)\n",
1291
- " if num_input_tokens \u003c= seq_size and seq_size - num_input_tokens \u003c delta:\n",
1292
- " delta = seq_size - num_input_tokens\n",
1293
- " best_signature = key\n",
1294
- " if best_signature is None:\n",
1295
- " raise ValueError(\n",
1296
- " \"The largest prefill length supported is %d, but we have %d number of\"\n",
1297
- " \" input tokens\" % (max_prefill_len, num_input_tokens)\n",
1298
- " )\n",
1299
- " return self._interpreter.get_signature_runner(best_signature)\n",
1300
- "\n",
1301
- " def _run_prefill(\n",
1302
- " self,\n",
1303
- " prefill_token_ids: Sequence[int],\n",
1304
- " ) -\u003e dict[str, np.ndarray]:\n",
1305
- " \"\"\"Runs prefill and returns the kv cache.\n",
1306
- "\n",
1307
- " Args:\n",
1308
- " prefill_token_ids: The token ids of the prefill input.\n",
1309
- "\n",
1310
- " Returns:\n",
1311
- " The updated kv cache.\n",
1312
- " \"\"\"\n",
1313
- " if not self._prefill_runner:\n",
1314
- " raise ValueError(\"Prefill runner is not initialized.\")\n",
1315
- " prefill_token_length = len(prefill_token_ids)\n",
1316
- " if prefill_token_length == 0:\n",
1317
- " return self._init_kv_cache()\n",
1318
- "\n",
1319
- " # Prepare the input to be [1, max_seq_len].\n",
1320
- " input_token_ids = [0] * self._max_seq_len\n",
1321
- " input_token_ids[:prefill_token_length] = prefill_token_ids\n",
1322
- " input_token_ids = np.asarray(input_token_ids, dtype=np.int32)\n",
1323
- " input_token_ids = np.expand_dims(input_token_ids, axis=0)\n",
1324
- "\n",
1325
- " # Prepare the input position to be [max_seq_len].\n",
1326
- " input_pos = [0] * self._max_seq_len\n",
1327
- " input_pos[:prefill_token_length] = range(prefill_token_length)\n",
1328
- " input_pos = np.asarray(input_pos, dtype=np.int32)\n",
1329
- "\n",
1330
- " # Initialize kv cache.\n",
1331
- " prefill_inputs = self._init_kv_cache()\n",
1332
- " prefill_inputs.update({\n",
1333
- " \"tokens\": input_token_ids,\n",
1334
- " \"input_pos\": input_pos,\n",
1335
- " })\n",
1336
- " prefill_outputs = self._prefill_runner(**prefill_inputs)\n",
1337
- " if \"logits\" in prefill_outputs:\n",
1338
- " # Prefill outputs includes logits and kv cache. We only output kv cache.\n",
1339
- " prefill_outputs.pop(\"logits\")\n",
1340
- "\n",
1341
- " return prefill_outputs\n",
1342
- "\n",
1343
- " def _greedy_sampler(self, logits: np.ndarray) -\u003e int:\n",
1344
- " return int(np.argmax(logits))\n",
1345
- "\n",
1346
- " def _run_decode(\n",
1347
- " self,\n",
1348
- " start_pos: int,\n",
1349
- " start_token_id: int,\n",
1350
- " kv_cache: dict[str, np.ndarray],\n",
1351
- " max_decode_steps: int,\n",
1352
- " ) -\u003e str:\n",
1353
- " \"\"\"Runs decode and outputs the token ids from greedy sampler.\n",
1354
- "\n",
1355
- " Args:\n",
1356
- " start_pos: The position of the first token of the decode input.\n",
1357
- " start_token_id: The token id of the first token of the decode input.\n",
1358
- " kv_cache: The kv cache from the prefill.\n",
1359
- " max_decode_steps: The max decode steps.\n",
1360
- "\n",
1361
- " Returns:\n",
1362
- " The token ids from the greedy sampler.\n",
1363
- " \"\"\"\n",
1364
- " next_pos = start_pos\n",
1365
- " next_token = start_token_id\n",
1366
- " decode_text = []\n",
1367
- " decode_inputs = kv_cache\n",
1368
- "\n",
1369
- " for _ in range(max_decode_steps):\n",
1370
- " decode_inputs.update({\n",
1371
- " \"tokens\": np.array([[next_token]], dtype=np.int32),\n",
1372
- " \"input_pos\": np.array([next_pos], dtype=np.int32),\n",
1373
- " })\n",
1374
- " decode_outputs = self._decode_runner(**decode_inputs)\n",
1375
- " # Output logits has shape (batch=1, 1, vocab_size). We only take the first\n",
1376
- " # element.\n",
1377
- " logits = decode_outputs.pop(\"logits\")[0][0]\n",
1378
- " next_token = self._greedy_sampler(logits)\n",
1379
- " if next_token == self._tokenizer.eos_token_id:\n",
1380
- " break\n",
1381
- " decode_text.append(\n",
1382
- " self._tokenizer.decode(next_token, skip_special_tokens=False)\n",
1383
- " )\n",
1384
- " print(decode_text[-1], end=\"\", flush=True)\n",
1385
- " # Decode outputs includes logits and kv cache. We already poped out\n",
1386
- " # logits, so the rest is kv cache. We pass the updated kv cache as input\n",
1387
- " # to the next decode step.\n",
1388
- " decode_inputs = decode_outputs\n",
1389
- " next_pos += 1\n",
1390
- "\n",
1391
- " print() # print a new line at the end.\n",
1392
- " return \"\".join(decode_text)\n",
1393
- "\n",
1394
- " def generate(self, prompt: str, max_decode_steps: int | None = None) -\u003e str:\n",
1395
- " token_ids = self._tokenizer.encode(\n",
1396
- " f\"<|endoftext|><|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n<|im_start|>user\\n{prompt}<|im_end|>\\n<|im_start|>assistant\\n\"\n",
1397
- " )\n",
1398
- " # Initialize the prefill runner with the suitable input size.\n",
1399
- " self._init_prefill_runner(len(token_ids))\n",
1400
- "\n",
1401
- " # Run prefill.\n",
1402
- " # Prefill up to the seond to the last token of the prompt, because the last\n",
1403
- " # token of the prompt will be used to bootstrap decode.\n",
1404
- " prefill_token_length = len(token_ids) - 1\n",
1405
- "\n",
1406
- " print(\"Running prefill\")\n",
1407
- " kv_cache = self._run_prefill(token_ids[:prefill_token_length])\n",
1408
- " # Run decode.\n",
1409
- " print(\"Running decode\")\n",
1410
- " actual_max_decode_steps = (\n",
1411
- " self._max_kv_cache_seq_len - prefill_token_length - 1\n",
1412
- " )\n",
1413
- " if max_decode_steps is not None:\n",
1414
- " actual_max_decode_steps = min(actual_max_decode_steps, max_decode_steps)\n",
1415
- " decode_text = self._run_decode(\n",
1416
- " prefill_token_length,\n",
1417
- " token_ids[prefill_token_length],\n",
1418
- " kv_cache,\n",
1419
- " actual_max_decode_steps,\n",
1420
- " )\n",
1421
- " return decode_text"
1422
  ],
1423
  "metadata": {
1424
- "id": "UBSGrHrM4ANm"
1425
  },
1426
- "execution_count": 15,
1427
  "outputs": []
1428
  },
1429
  {
@@ -1439,19 +89,8 @@
1439
  "cell_type": "code",
1440
  "source": [
1441
  "# Disclaimer: Model performance demonstrated with the Python API in this notebook is not representative of performance on a local device.\n",
1442
- "pipeline = LiteRTLlmPipeline(interpreter, tokenizer)"
1443
- ],
1444
- "metadata": {
1445
- "id": "AZhlDQWg61AL"
1446
- },
1447
- "execution_count": 16,
1448
- "outputs": []
1449
- },
1450
- {
1451
- "cell_type": "code",
1452
- "source": [
1453
  "prompt = \"What is the capital of France?\"\n",
1454
- "output = pipeline.generate(prompt, max_decode_steps=None)"
1455
  ],
1456
  "metadata": {
1457
  "id": "wT9BIiATkjzL"
 
11
  },
12
  "language_info": {
13
  "name": "python"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  }
15
  },
16
  "cells": [
17
  {
18
  "cell_type": "markdown",
19
  "source": [
20
+ "# Install Dependencies"
21
  ],
22
  "metadata": {
23
  "id": "39AMoCOa1ckc"
 
27
  "metadata": {
28
  "id": "VoHxuLPu7s37"
29
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  "cell_type": "code",
31
  "source": [
32
+ "! wget -q https://github.com/protocolbuffers/protobuf/releases/download/v3.19.0/protoc-3.19.0-linux-x86_64.zip\n",
33
+ "! unzip -o protoc-3.19.0-linux-x86_64.zip -d /usr/local/"
 
 
 
 
34
  ],
35
+ "outputs": [],
36
+ "execution_count": null
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  },
38
  {
39
  "cell_type": "markdown",
40
  "source": [
41
+ "## Install LiteRT Pipeline"
42
  ],
43
  "metadata": {
44
+ "id": "qGAaAKzYK5ei"
45
  }
46
  },
47
  {
48
  "cell_type": "code",
49
  "source": [
50
+ "!pip install git+https://github.com/google-ai-edge/ai-edge-apis.git#subdirectory=litert_tools"
 
 
 
 
 
 
51
  ],
52
  "metadata": {
53
+ "id": "43tAeO0AZ7zp"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  },
55
+ "execution_count": null,
56
  "outputs": []
57
  },
58
  {
59
  "cell_type": "markdown",
60
  "source": [
61
+ "# Create Pipeline from model file"
62
  ],
63
  "metadata": {
64
+ "id": "K5okZCTgYpUd"
65
  }
66
  },
67
  {
68
  "cell_type": "code",
69
  "source": [
70
+ "from litert_tools.pipeline import pipeline\n",
71
+ "runner = pipeline.load(\"litert-community/Qwen2.5-1.5B-Instruct\", \"Qwen2.5-1.5B-Instruct_seq128_q8_ekv1280.task\")"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ],
73
  "metadata": {
74
+ "id": "3t47HAG2tvc3"
75
  },
76
+ "execution_count": null,
77
  "outputs": []
78
  },
79
  {
 
89
  "cell_type": "code",
90
  "source": [
91
  "# Disclaimer: Model performance demonstrated with the Python API in this notebook is not representative of performance on a local device.\n",
 
 
 
 
 
 
 
 
 
 
 
92
  "prompt = \"What is the capital of France?\"\n",
93
+ "output = runner.generate(prompt, max_decode_steps=None)"
94
  ],
95
  "metadata": {
96
  "id": "wT9BIiATkjzL"