Add files using upload-large-folder tool

Browse files

Files changed (9) hide show

.gitattributes +4 -0
Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv1280.task +2 -2
Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv4096.task +3 -0
Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task +2 -2
Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.task +3 -0
Qwen2.5-1.5B-Instruct_seq128_q8_ekv1280.task +3 -0
Qwen2.5-1.5B-Instruct_seq128_q8_ekv4096.task +3 -0
README.md +74 -27
notebook.ipynb +17 -1378

.gitattributes CHANGED Viewed

@@ -35,3 +35,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task filter=lfs diff=lfs merge=lfs -text
 Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv1280.task filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task filter=lfs diff=lfs merge=lfs -text
 Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv1280.task filter=lfs diff=lfs merge=lfs -text
+Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.task filter=lfs diff=lfs merge=lfs -text
+Qwen2.5-1.5B-Instruct_seq128_q8_ekv4096.task filter=lfs diff=lfs merge=lfs -text
+Qwen2.5-1.5B-Instruct_seq128_q8_ekv1280.task filter=lfs diff=lfs merge=lfs -text
+Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv4096.task filter=lfs diff=lfs merge=lfs -text

Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv1280.task CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0a4a95c9bfd3132201f306e29d3d3b134469349a74c692ae2f5206e191480e1c
-size 6190659644

 version https://git-lfs.github.com/spec/v1
+oid sha256:edec401a8dd8de5409a5a1618e73ade59385d7a275b383b5c664e99882949c6d
+size 6181747452

Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv4096.task ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6a6bf44f952218b22a76cc1b94ac46a5db8dc43c7f31b7e635f2ca608e3f35ed
+size 4296491008

Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:63f1f319ae8c0db7616217537c9b0b603f11aebe3ef61917184e1c74e1be217a
-size 1625493432

 version https://git-lfs.github.com/spec/v1
+oid sha256:8d867a7c93a6acf2892f08e0174e2f6f351ad256b7e3cfb6d6cd9c89794b42e0
+size 1597913616

Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.task ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2a662d1af6ca2c750bc047738d6f48441d4b6968062c24ef52fcff5d693291f6
+size 1597913616

Qwen2.5-1.5B-Instruct_seq128_q8_ekv1280.task ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8771564e61c908c2199bcaa28b0ff9c5f55afb2ae73fbe263142a067113968df
+size 1567364648

Qwen2.5-1.5B-Instruct_seq128_q8_ekv4096.task ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:98c289e1c43cc592ac535594d5de4bdde449e8dc012ac66909064b6880f8b717
+size 1567364648

README.md CHANGED Viewed

@@ -29,55 +29,102 @@ on Colab could be much worse than on a local device.*
 ### Android
 *   Download and install
-    [the apk](https://github.com/google-ai-edge/mediapipe-samples/releases/latest/download/llm_inference-debug.apk).
 *   Follow the instructions in the app.
-To build the demo app from source, please follow the
-[instructions](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/android/README.md)
 from the GitHub repository.
 ## Performance
 ### Android
-Note that all benchmark stats are from a Samsung S24 Ultra with
-1280 KV cache size with multiple prefill signatures enabled.
 <table border="1">
   <tr>
    <th></th>
-   <th>Backend</th>
-   <th>Prefill (tokens/sec)</th>
-   <th>Decode (tokens/sec)</th>
-   <th>Time-to-first-token (sec)</th>
-   <th>Memory (RSS in MB)</th>
-   <th>Model size (MB)</th>
   </tr>
   <tr>
-<td>fp32 (baseline)</td>
-<td>cpu</td>
-<td><p style="text-align: right">37.21 tk/s</p></td>
-<td><p style="text-align: right">5.22 tk/s</p></td>
-<td><p style="text-align: right">16.85 s</p></td>
-<td><p style="text-align: right">6,662 MB</p></td>
-<td><p style="text-align: right">5,903 MB</p></td>
 </tr>
 <tr>
-<td>dynamic_int8</td>
-<td>cpu</td>
-<td><p style="text-align: right">113.68 tk/s</p></td>
-<td><p style="text-align: right">16.41 tk/s</p></td>
-<td><p style="text-align: right">5.79 s</p></td>
-<td><p style="text-align: right">3,593 MB</p></td>
-<td><p style="text-align: right">1,550 MB</p></td>
 </tr>
 </table>
 *   Model Size: measured by the size of the .tflite flatbuffer (serialization
     format for LiteRT models)
 *   Memory: indicator of peak RAM usage
 *   The inference on CPU is accelerated via the LiteRT
     [XNNPACK](https://github.com/google/XNNPACK) delegate with 4 threads
-*   Benchmark is done assuming XNNPACK cache is enabled
-*   dynamic_int8: quantized model with int8 weights and float activations.

 ### Android
 *   Download and install
+    [the apk](https://github.com/google-ai-edge/gallery/releases/latest/download/ai-edge-gallery.apk).
 *   Follow the instructions in the app.
+To build the demo app from source, please follow the [instructions](https://github.com/google-ai-edge/gallery/blob/main/README.md)
 from the GitHub repository.
+### iOS
+*   Clone the [MediaPipe samples](https://github.com/google-ai-edge/mediapipe-samples)
+    repository and follow the [instructions](https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/llm_inference/ios/README.md)
+    to build the LLM Inference iOS Sample App using XCode.
+*   Run the app via the iOS simulator or deploy to an iOS device.
 ## Performance
 ### Android
+Note that all benchmark stats are from a Samsung S24 Ultra and multiple prefill signatures enabled.
 <table border="1">
   <tr>
+   <th style="text-align: left">Backend</th>
+   <th style="text-align: left">Quantization scheme</th>
+   <th style="text-align: left">Context length</th>
+   <th style="text-align: left">Prefill (tokens/sec)</th>
+   <th style="text-align: left">Decode (tokens/sec)</th>
+   <th style="text-align: left">Time-to-first-token (sec)</th>
+   <th style="text-align: left">CPU Memory (RSS in MB)</th>
+   <th style="text-align: left">GPU Memory (RSS in MB)</th>
+   <th style="text-align: left">Model size (MB)</th>
    <th></th>
   </tr>
   <tr>
+<td rowspan="3"><p style="text-align: left">CPU</p></td>
+<td><p style="text-align: left">fp32 (baseline)</p></td>
+<td><p style="text-align: right">1280</p></td>
+<td><p style="text-align: right">27 tk/s</p></td>
+<td><p style="text-align: right">6 tk/s</p></td>
+<td><p style="text-align: right">9.88 s</p></td>
+<td><p style="text-align: right">6,144 MB</p></td>
+<td><p style="text-align: right"></p></td>
+<td><p style="text-align: right">5,895 MB</p></td>
+<td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_f32_ekv1280.task">&#128279;</a></p></td>
 </tr>
 <tr>
+<td rowspan="4"><p style="text-align: left">dynamic_int8</p></td>
+<td><p style="text-align: right">1280</p></td>
+<td><p style="text-align: right">106 tk/s</p></td>
+<td><p style="text-align: right">23 tk/s</p></td>
+<td><p style="text-align: right">2.74 s</p></td>
+<td><p style="text-align: right">1,820 MB</p></td>
+<td><p style="text-align: right"></p></td>
+<td><p style="text-align: right">1,523 MB</p></td>
+<td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task">&#128279;</a></p></td>
+</tr>
+<tr>
+<td><p style="text-align: right">4096</p></td>
+<td><p style="text-align: right">63 tk/s</p></td>
+<td><p style="text-align: right">20 tk/s</p></td>
+<td><p style="text-align: right">4.40 s</p></td>
+<td><p style="text-align: right">2,042 MB</p></td>
+<td><p style="text-align: right"></p></td>
+<td><p style="text-align: right">1,523 MB</p></td>
+<td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.task">&#128279;</a></p></td>
+</tr>
+<tr>
+<td rowspan="2"><p style="text-align: left">GPU</p></td>
+<td><p style="text-align: right">1280</p></td>
+<td><p style="text-align: right">706 tk/s</p></td>
+<td><p style="text-align: right">24 tk/s</p></td>
+<td><p style="text-align: right">6.94 s</p></td>
+<td><p style="text-align: right">3,175 MB</p></td>
+<td><p style="text-align: right">1,504 MB</p></td>
+<td><p style="text-align: right">1,523 MB</p></td>
+<td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv1280.task">&#128279;</a></p></td>
+</tr>
+<tr>
+<td><p style="text-align: right">4096</p></td>
+<td><p style="text-align: right">417 tk/s</p></td>
+<td><p style="text-align: right">22 tk/s</p></td>
+<td><p style="text-align: right">7.93 s</p></td>
+<td><p style="text-align: right">3,176 MB</p></td>
+<td><p style="text-align: right">1,875 MB</p></td>
+<td><p style="text-align: right">1,523 MB</p></td>
+<td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.task">&#128279;</a></p></td>
 </tr>
 </table>
+*   For the list of supported quantization schemes see [supported-schemes](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/quantize#supported-schemes).
+    For these models, we are using prefill signature lengths of 32, 128, 512 and 1280.
 *   Model Size: measured by the size of the .tflite flatbuffer (serialization
     format for LiteRT models)
 *   Memory: indicator of peak RAM usage
 *   The inference on CPU is accelerated via the LiteRT
     [XNNPACK](https://github.com/google/XNNPACK) delegate with 4 threads
+*   Benchmark is run with cache enabled and initialized. During the first run,
+    the time to first token may differ.

notebook.ipynb CHANGED Viewed

@@ -11,1043 +11,13 @@
     },
     "language_info": {
       "name": "python"
-    },
-    "widgets": {
-      "application/vnd.jupyter.widget-state+json": {
-        "47cd47140dbb4e28a4f31d5632bfe82d": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "HBoxModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "HBoxModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "HBoxView",
-            "box_style": "",
-            "children": [
-              "IPY_MODEL_7c0ddb1e0e3145f08ccb0c32b02c562f",
-              "IPY_MODEL_85c490db972b4d659caad513359a6700",
-              "IPY_MODEL_d61e96ae08d84414a638dd592f13fb18"
-            ],
-            "layout": "IPY_MODEL_9e7f4734aa034e4aa5207b8a2498ee02"
-          }
-        },
-        "7c0ddb1e0e3145f08ccb0c32b02c562f": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "HTMLModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "HTMLModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "HTMLView",
-            "description": "",
-            "description_tooltip": null,
-            "layout": "IPY_MODEL_df08ba8056fb47cb969e132087987e68",
-            "placeholder": "",
-            "style": "IPY_MODEL_470febc3af8348ef8611255e88401229",
-            "value": "deepseek_q8_seq128_ekv1280.tflite: 100%"
-          }
-        },
-        "85c490db972b4d659caad513359a6700": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "FloatProgressModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "FloatProgressModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "ProgressView",
-            "bar_style": "success",
-            "description": "",
-            "description_tooltip": null,
-            "layout": "IPY_MODEL_39cedca11f574c01808acdc1be9aa68d",
-            "max": 1808783640,
-            "min": 0,
-            "orientation": "horizontal",
-            "style": "IPY_MODEL_62bd6d393ca74193bded59a8ebd0a749",
-            "value": 1808783640
-          }
-        },
-        "d61e96ae08d84414a638dd592f13fb18": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "HTMLModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "HTMLModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "HTMLView",
-            "description": "",
-            "description_tooltip": null,
-            "layout": "IPY_MODEL_475c5c4fc6eb404180d7b69d75f797ea",
-            "placeholder": "",
-            "style": "IPY_MODEL_b815fc17c9ee4913b5cb452653ff1af9",
-            "value": " 1.81G/1.81G [00:16\u0026lt;00:00, 160MB/s]"
-          }
-        },
-        "9e7f4734aa034e4aa5207b8a2498ee02": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "df08ba8056fb47cb969e132087987e68": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "470febc3af8348ef8611255e88401229": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "DescriptionStyleModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "DescriptionStyleModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "StyleView",
-            "description_width": ""
-          }
-        },
-        "39cedca11f574c01808acdc1be9aa68d": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "62bd6d393ca74193bded59a8ebd0a749": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "ProgressStyleModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "ProgressStyleModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "StyleView",
-            "bar_color": null,
-            "description_width": ""
-          }
-        },
-        "475c5c4fc6eb404180d7b69d75f797ea": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "b815fc17c9ee4913b5cb452653ff1af9": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "DescriptionStyleModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "DescriptionStyleModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "StyleView",
-            "description_width": ""
-          }
-        },
-        "8cac4d03da1044d6adb8b62752ed6775": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "HBoxModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "HBoxModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "HBoxView",
-            "box_style": "",
-            "children": [
-              "IPY_MODEL_a201091e2f9b4f6c8a7d780dde854134",
-              "IPY_MODEL_16e2c22fb42e41e8b810c4e659091d37",
-              "IPY_MODEL_a1f5e814104646cbac5db19fdbcfccb2"
-            ],
-            "layout": "IPY_MODEL_3186fb1553884a7da72a387f1e00eca5"
-          }
-        },
-        "a201091e2f9b4f6c8a7d780dde854134": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "HTMLModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "HTMLModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "HTMLView",
-            "description": "",
-            "description_tooltip": null,
-            "layout": "IPY_MODEL_875fbcb976bf486092d3c6f483b9e042",
-            "placeholder": "",
-            "style": "IPY_MODEL_e2a24c0c90b149508715998b1cf301f7",
-            "value": "tokenizer_config.json: 100%"
-          }
-        },
-        "16e2c22fb42e41e8b810c4e659091d37": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "FloatProgressModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "FloatProgressModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "ProgressView",
-            "bar_style": "success",
-            "description": "",
-            "description_tooltip": null,
-            "layout": "IPY_MODEL_c730ecd68ae547b1822039b86bd22322",
-            "max": 3071,
-            "min": 0,
-            "orientation": "horizontal",
-            "style": "IPY_MODEL_0cd73c61a5e04ae1854eb1f1c4d92317",
-            "value": 3071
-          }
-        },
-        "a1f5e814104646cbac5db19fdbcfccb2": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "HTMLModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "HTMLModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "HTMLView",
-            "description": "",
-            "description_tooltip": null,
-            "layout": "IPY_MODEL_c46a9a3e8c7d4560ae71226920e17acd",
-            "placeholder": "",
-            "style": "IPY_MODEL_2303aed14ff44e178ed20edf1f2e5359",
-            "value": " 3.07k/3.07k [00:00\u0026lt;00:00, 267kB/s]"
-          }
-        },
-        "3186fb1553884a7da72a387f1e00eca5": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "875fbcb976bf486092d3c6f483b9e042": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "e2a24c0c90b149508715998b1cf301f7": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "DescriptionStyleModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "DescriptionStyleModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "StyleView",
-            "description_width": ""
-          }
-        },
-        "c730ecd68ae547b1822039b86bd22322": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "0cd73c61a5e04ae1854eb1f1c4d92317": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "ProgressStyleModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "ProgressStyleModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "StyleView",
-            "bar_color": null,
-            "description_width": ""
-          }
-        },
-        "c46a9a3e8c7d4560ae71226920e17acd": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "2303aed14ff44e178ed20edf1f2e5359": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "DescriptionStyleModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "DescriptionStyleModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "StyleView",
-            "description_width": ""
-          }
-        },
-        "072e1baca7d64766807df5454dc9e3cc": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "HBoxModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "HBoxModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "HBoxView",
-            "box_style": "",
-            "children": [
-              "IPY_MODEL_6da37a13974c4c3890c7676d194021bc",
-              "IPY_MODEL_2f5b6f1af091405287c35c53ad169354",
-              "IPY_MODEL_b977fb3e42a14fe1bec47426ae1efded"
-            ],
-            "layout": "IPY_MODEL_a063adb2cc1c44438d5f631fb16297ae"
-          }
-        },
-        "6da37a13974c4c3890c7676d194021bc": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "HTMLModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "HTMLModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "HTMLView",
-            "description": "",
-            "description_tooltip": null,
-            "layout": "IPY_MODEL_50f86e2ac8444d1986d8d9afe9fcee37",
-            "placeholder": "",
-            "style": "IPY_MODEL_da323d8a744a43d8901f19c48b1e1223",
-            "value": "tokenizer.json: 100%"
-          }
-        },
-        "2f5b6f1af091405287c35c53ad169354": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "FloatProgressModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "FloatProgressModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "ProgressView",
-            "bar_style": "success",
-            "description": "",
-            "description_tooltip": null,
-            "layout": "IPY_MODEL_69afe592335b4d73b51b63e4c56407fc",
-            "max": 7031660,
-            "min": 0,
-            "orientation": "horizontal",
-            "style": "IPY_MODEL_f3605ab95cbf4ebda9a678a0788e9682",
-            "value": 7031660
-          }
-        },
-        "b977fb3e42a14fe1bec47426ae1efded": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "HTMLModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_dom_classes": [],
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "HTMLModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/controls",
-            "_view_module_version": "1.5.0",
-            "_view_name": "HTMLView",
-            "description": "",
-            "description_tooltip": null,
-            "layout": "IPY_MODEL_7d2023b2a9054a3991983a30fdc6555b",
-            "placeholder": "",
-            "style": "IPY_MODEL_17d028b387724317ae9994819a97a3a4",
-            "value": " 7.03M/7.03M [00:00\u0026lt;00:00, 28.7MB/s]"
-          }
-        },
-        "a063adb2cc1c44438d5f631fb16297ae": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "50f86e2ac8444d1986d8d9afe9fcee37": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "da323d8a744a43d8901f19c48b1e1223": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "DescriptionStyleModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "DescriptionStyleModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "StyleView",
-            "description_width": ""
-          }
-        },
-        "69afe592335b4d73b51b63e4c56407fc": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "f3605ab95cbf4ebda9a678a0788e9682": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "ProgressStyleModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "ProgressStyleModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "StyleView",
-            "bar_color": null,
-            "description_width": ""
-          }
-        },
-        "7d2023b2a9054a3991983a30fdc6555b": {
-          "model_module": "@jupyter-widgets/base",
-          "model_name": "LayoutModel",
-          "model_module_version": "1.2.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/base",
-            "_model_module_version": "1.2.0",
-            "_model_name": "LayoutModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "LayoutView",
-            "align_content": null,
-            "align_items": null,
-            "align_self": null,
-            "border": null,
-            "bottom": null,
-            "display": null,
-            "flex": null,
-            "flex_flow": null,
-            "grid_area": null,
-            "grid_auto_columns": null,
-            "grid_auto_flow": null,
-            "grid_auto_rows": null,
-            "grid_column": null,
-            "grid_gap": null,
-            "grid_row": null,
-            "grid_template_areas": null,
-            "grid_template_columns": null,
-            "grid_template_rows": null,
-            "height": null,
-            "justify_content": null,
-            "justify_items": null,
-            "left": null,
-            "margin": null,
-            "max_height": null,
-            "max_width": null,
-            "min_height": null,
-            "min_width": null,
-            "object_fit": null,
-            "object_position": null,
-            "order": null,
-            "overflow": null,
-            "overflow_x": null,
-            "overflow_y": null,
-            "padding": null,
-            "right": null,
-            "top": null,
-            "visibility": null,
-            "width": null
-          }
-        },
-        "17d028b387724317ae9994819a97a3a4": {
-          "model_module": "@jupyter-widgets/controls",
-          "model_name": "DescriptionStyleModel",
-          "model_module_version": "1.5.0",
-          "state": {
-            "_model_module": "@jupyter-widgets/controls",
-            "_model_module_version": "1.5.0",
-            "_model_name": "DescriptionStyleModel",
-            "_view_count": null,
-            "_view_module": "@jupyter-widgets/base",
-            "_view_module_version": "1.2.0",
-            "_view_name": "StyleView",
-            "description_width": ""
-          }
-        }
-      }
     }
   },
   "cells": [
     {
       "cell_type": "markdown",
       "source": [
-        "#Install dependencies"
       ],
       "metadata": {
         "id": "39AMoCOa1ckc"
@@ -1057,373 +27,53 @@
       "metadata": {
         "id": "VoHxuLPu7s37"
       },
-      "cell_type": "code",
-      "source": [],
-      "outputs": [],
-      "execution_count": null
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!pip install ai-edge-litert"
-      ],
-      "metadata": {
-        "id": "43tAeO0AZ7zp",
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "outputId": "76cd0d1b-7de2-4519-c0ae-1b9e6ee37653"
-      },
-      "execution_count": 1,
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": []
-        }
-      ]
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from collections.abc import Sequence\n",
-        "import sys\n",
-        "from ai_edge_litert import interpreter as interpreter_lib\n",
-        "import numpy as np\n",
-        "from transformers import AutoTokenizer"
-      ],
-      "metadata": {
-        "id": "i6PMkMVBPr1p"
-      },
-      "execution_count": 2,
-      "outputs": []
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "# Download model files"
-      ],
-      "metadata": {
-        "id": "K5okZCTgYpUd"
-      }
-    },
-    {
       "cell_type": "code",
       "source": [
-        "from huggingface_hub import hf_hub_download\n",
-        "\n",
-        "model_path = hf_hub_download(\n",
-        "    repo_id=\"litert-community/Qwen2.5-1.5B-Instruct\",\n",
-        "    filename=\"Qwen2.5-1.5B-Instruct_seq128_q8_ekv1280.tflite\",\n",
-        ")"
       ],
-      "metadata": {
-        "id": "3t47HAG2tvc3",
-        "colab": {
-          "base_uri": "https://localhost:8080/",
-          "height": 49,
-          "referenced_widgets": [
-            "47cd47140dbb4e28a4f31d5632bfe82d",
-            "7c0ddb1e0e3145f08ccb0c32b02c562f",
-            "85c490db972b4d659caad513359a6700",
-            "d61e96ae08d84414a638dd592f13fb18",
-            "9e7f4734aa034e4aa5207b8a2498ee02",
-            "df08ba8056fb47cb969e132087987e68",
-            "470febc3af8348ef8611255e88401229",
-            "39cedca11f574c01808acdc1be9aa68d",
-            "62bd6d393ca74193bded59a8ebd0a749",
-            "475c5c4fc6eb404180d7b69d75f797ea",
-            "b815fc17c9ee4913b5cb452653ff1af9"
-          ]
-        },
-        "outputId": "d1d8ed1a-5ec6-4121-9d3c-fada487fc8ed"
-      },
-      "execution_count": 3,
-      "outputs": []
     },
     {
       "cell_type": "markdown",
       "source": [
-        "# Create LiteRT interpreter and tokenizer"
       ],
       "metadata": {
-        "id": "n5Xa4s6XhWqk"
       }
     },
     {
       "cell_type": "code",
       "source": [
-        "interpreter = interpreter_lib.InterpreterWithCustomOps(\n",
-        "    custom_op_registerers=[\"pywrap_genai_ops.GenAIOpsRegisterer\"],\n",
-        "    model_path=model_path,\n",
-        "    num_threads=2,\n",
-        "    experimental_default_delegate_latest_features=True,\n",
-        ")\n",
-        "tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-1.5B-Instruct\")"
       ],
       "metadata": {
-        "id": "Rvdn3EIZhaQn",
-        "colab": {
-          "base_uri": "https://localhost:8080/",
-          "height": 81,
-          "referenced_widgets": [
-            "8cac4d03da1044d6adb8b62752ed6775",
-            "a201091e2f9b4f6c8a7d780dde854134",
-            "16e2c22fb42e41e8b810c4e659091d37",
-            "a1f5e814104646cbac5db19fdbcfccb2",
-            "3186fb1553884a7da72a387f1e00eca5",
-            "875fbcb976bf486092d3c6f483b9e042",
-            "e2a24c0c90b149508715998b1cf301f7",
-            "c730ecd68ae547b1822039b86bd22322",
-            "0cd73c61a5e04ae1854eb1f1c4d92317",
-            "c46a9a3e8c7d4560ae71226920e17acd",
-            "2303aed14ff44e178ed20edf1f2e5359",
-            "072e1baca7d64766807df5454dc9e3cc",
-            "6da37a13974c4c3890c7676d194021bc",
-            "2f5b6f1af091405287c35c53ad169354",
-            "b977fb3e42a14fe1bec47426ae1efded",
-            "a063adb2cc1c44438d5f631fb16297ae",
-            "50f86e2ac8444d1986d8d9afe9fcee37",
-            "da323d8a744a43d8901f19c48b1e1223",
-            "69afe592335b4d73b51b63e4c56407fc",
-            "f3605ab95cbf4ebda9a678a0788e9682",
-            "7d2023b2a9054a3991983a30fdc6555b",
-            "17d028b387724317ae9994819a97a3a4"
-          ]
-        },
-        "outputId": "e05a5944-5312-41c4-e38e-7e26a921e63c"
       },
-      "execution_count": 4,
       "outputs": []
     },
     {
       "cell_type": "markdown",
       "source": [
-        "# Create pipeline with LiteRT models"
       ],
       "metadata": {
-        "id": "AM6rDABTXt2F"
       }
     },
     {
       "cell_type": "code",
       "source": [
-        "class LiteRTLlmPipeline:\n",
-        "\n",
-        "  def __init__(self, interpreter, tokenizer):\n",
-        "    \"\"\"Initializes the pipeline.\"\"\"\n",
-        "    self._interpreter = interpreter\n",
-        "    self._tokenizer = tokenizer\n",
-        "\n",
-        "    self._prefill_runner = None\n",
-        "    self._decode_runner = self._interpreter.get_signature_runner(\"decode\")\n",
-        "\n",
-        "  def _init_prefill_runner(self, num_input_tokens: int):\n",
-        "    \"\"\"Initializes all the variables related to the prefill runner.\n",
-        "\n",
-        "    This method initializes the following variables:\n",
-        "      - self._prefill_runner: The prefill runner based on the input size.\n",
-        "      - self._max_seq_len: The maximum sequence length supported by the model.\n",
-        "      - self._max_kv_cache_seq_len: The maximum sequence length supported by the\n",
-        "        KV cache.\n",
-        "\n",
-        "    Args:\n",
-        "      num_input_tokens: The number of input tokens.\n",
-        "    \"\"\"\n",
-        "    if not self._interpreter:\n",
-        "      raise ValueError(\"Interpreter is not initialized.\")\n",
-        "\n",
-        "    # Prefill runner related variables will be initialized in `predict_text` and\n",
-        "    # `compute_log_likelihood`.\n",
-        "    self._prefill_runner = self._get_prefill_runner(num_input_tokens)\n",
-        "    # input_token_shape has shape (batch, max_seq_len)\n",
-        "    input_token_shape = self._prefill_runner.get_input_details()[\"tokens\"][\n",
-        "        \"shape\"\n",
-        "    ]\n",
-        "    if len(input_token_shape) == 1:\n",
-        "      self._max_seq_len = input_token_shape[0]\n",
-        "    else:\n",
-        "      self._max_seq_len = input_token_shape[1]\n",
-        "\n",
-        "    # kv cache input has shape [batch=1, seq_len, num_heads, dim].\n",
-        "    kv_cache_shape = self._prefill_runner.get_input_details()[\"kv_cache_k_0\"][\n",
-        "        \"shape\"\n",
-        "    ]\n",
-        "    self._max_kv_cache_seq_len = kv_cache_shape[1]\n",
-        "\n",
-        "  def _init_kv_cache(self) -\u003e dict[str, np.ndarray]:\n",
-        "    if self._prefill_runner is None:\n",
-        "      raise ValueError(\"Prefill runner is not initialized.\")\n",
-        "    kv_cache = {}\n",
-        "    for input_key in self._prefill_runner.get_input_details().keys():\n",
-        "      if \"kv_cache\" in input_key:\n",
-        "        kv_cache[input_key] = np.zeros(\n",
-        "            self._prefill_runner.get_input_details()[input_key][\"shape\"],\n",
-        "            dtype=np.float32,\n",
-        "        )\n",
-        "        kv_cache[input_key] = np.zeros(\n",
-        "            self._prefill_runner.get_input_details()[input_key][\"shape\"],\n",
-        "            dtype=np.float32,\n",
-        "        )\n",
-        "    return kv_cache\n",
-        "\n",
-        "  def _get_prefill_runner(self, num_input_tokens: int):\n",
-        "    \"\"\"Gets the prefill runner with the best suitable input size.\n",
-        "\n",
-        "    Args:\n",
-        "      num_input_tokens: The number of input tokens.\n",
-        "\n",
-        "    Returns:\n",
-        "      The prefill runner with the smallest input size.\n",
-        "    \"\"\"\n",
-        "    best_signature = None\n",
-        "    delta = sys.maxsize\n",
-        "    max_prefill_len = -1\n",
-        "    for key in self._interpreter.get_signature_list().keys():\n",
-        "      if \"prefill\" not in key:\n",
-        "        continue\n",
-        "      input_pos = self._interpreter.get_signature_runner(\n",
-        "          key\n",
-        "      ).get_input_details()[\"input_pos\"]\n",
-        "      # input_pos[\"shape\"] has shape (max_seq_len, )\n",
-        "      seq_size = input_pos[\"shape\"][0]\n",
-        "      max_prefill_len = max(max_prefill_len, seq_size)\n",
-        "      if num_input_tokens \u003c= seq_size and seq_size - num_input_tokens \u003c delta:\n",
-        "        delta = seq_size - num_input_tokens\n",
-        "        best_signature = key\n",
-        "    if best_signature is None:\n",
-        "      raise ValueError(\n",
-        "          \"The largest prefill length supported is %d, but we have %d number of\"\n",
-        "          \" input tokens\" % (max_prefill_len, num_input_tokens)\n",
-        "      )\n",
-        "    return self._interpreter.get_signature_runner(best_signature)\n",
-        "\n",
-        "  def _run_prefill(\n",
-        "      self,\n",
-        "      prefill_token_ids: Sequence[int],\n",
-        "  ) -\u003e dict[str, np.ndarray]:\n",
-        "    \"\"\"Runs prefill and returns the kv cache.\n",
-        "\n",
-        "    Args:\n",
-        "      prefill_token_ids: The token ids of the prefill input.\n",
-        "\n",
-        "    Returns:\n",
-        "      The updated kv cache.\n",
-        "    \"\"\"\n",
-        "    if not self._prefill_runner:\n",
-        "      raise ValueError(\"Prefill runner is not initialized.\")\n",
-        "    prefill_token_length = len(prefill_token_ids)\n",
-        "    if prefill_token_length == 0:\n",
-        "      return self._init_kv_cache()\n",
-        "\n",
-        "    # Prepare the input to be [1, max_seq_len].\n",
-        "    input_token_ids = [0] * self._max_seq_len\n",
-        "    input_token_ids[:prefill_token_length] = prefill_token_ids\n",
-        "    input_token_ids = np.asarray(input_token_ids, dtype=np.int32)\n",
-        "    input_token_ids = np.expand_dims(input_token_ids, axis=0)\n",
-        "\n",
-        "    # Prepare the input position to be [max_seq_len].\n",
-        "    input_pos = [0] * self._max_seq_len\n",
-        "    input_pos[:prefill_token_length] = range(prefill_token_length)\n",
-        "    input_pos = np.asarray(input_pos, dtype=np.int32)\n",
-        "\n",
-        "    # Initialize kv cache.\n",
-        "    prefill_inputs = self._init_kv_cache()\n",
-        "    prefill_inputs.update({\n",
-        "        \"tokens\": input_token_ids,\n",
-        "        \"input_pos\": input_pos,\n",
-        "    })\n",
-        "    prefill_outputs = self._prefill_runner(**prefill_inputs)\n",
-        "    if \"logits\" in prefill_outputs:\n",
-        "      # Prefill outputs includes logits and kv cache. We only output kv cache.\n",
-        "      prefill_outputs.pop(\"logits\")\n",
-        "\n",
-        "    return prefill_outputs\n",
-        "\n",
-        "  def _greedy_sampler(self, logits: np.ndarray) -\u003e int:\n",
-        "    return int(np.argmax(logits))\n",
-        "\n",
-        "  def _run_decode(\n",
-        "      self,\n",
-        "      start_pos: int,\n",
-        "      start_token_id: int,\n",
-        "      kv_cache: dict[str, np.ndarray],\n",
-        "      max_decode_steps: int,\n",
-        "  ) -\u003e str:\n",
-        "    \"\"\"Runs decode and outputs the token ids from greedy sampler.\n",
-        "\n",
-        "    Args:\n",
-        "      start_pos: The position of the first token of the decode input.\n",
-        "      start_token_id: The token id of the first token of the decode input.\n",
-        "      kv_cache: The kv cache from the prefill.\n",
-        "      max_decode_steps: The max decode steps.\n",
-        "\n",
-        "    Returns:\n",
-        "      The token ids from the greedy sampler.\n",
-        "    \"\"\"\n",
-        "    next_pos = start_pos\n",
-        "    next_token = start_token_id\n",
-        "    decode_text = []\n",
-        "    decode_inputs = kv_cache\n",
-        "\n",
-        "    for _ in range(max_decode_steps):\n",
-        "      decode_inputs.update({\n",
-        "          \"tokens\": np.array([[next_token]], dtype=np.int32),\n",
-        "          \"input_pos\": np.array([next_pos], dtype=np.int32),\n",
-        "      })\n",
-        "      decode_outputs = self._decode_runner(**decode_inputs)\n",
-        "      # Output logits has shape (batch=1, 1, vocab_size). We only take the first\n",
-        "      # element.\n",
-        "      logits = decode_outputs.pop(\"logits\")[0][0]\n",
-        "      next_token = self._greedy_sampler(logits)\n",
-        "      if next_token == self._tokenizer.eos_token_id:\n",
-        "        break\n",
-        "      decode_text.append(\n",
-        "          self._tokenizer.decode(next_token, skip_special_tokens=False)\n",
-        "      )\n",
-        "      print(decode_text[-1], end=\"\", flush=True)\n",
-        "      # Decode outputs includes logits and kv cache. We already poped out\n",
-        "      # logits, so the rest is kv cache. We pass the updated kv cache as input\n",
-        "      # to the next decode step.\n",
-        "      decode_inputs = decode_outputs\n",
-        "      next_pos += 1\n",
-        "\n",
-        "    print()  # print a new line at the end.\n",
-        "    return \"\".join(decode_text)\n",
-        "\n",
-        "  def generate(self, prompt: str, max_decode_steps: int | None = None) -\u003e str:\n",
-        "    token_ids = self._tokenizer.encode(\n",
-        "        f\"<|endoftext|><|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n<|im_start|>user\\n{prompt}<|im_end|>\\n<|im_start|>assistant\\n\"\n",
-        "    )\n",
-        "    # Initialize the prefill runner with the suitable input size.\n",
-        "    self._init_prefill_runner(len(token_ids))\n",
-        "\n",
-        "    # Run prefill.\n",
-        "    # Prefill up to the seond to the last token of the prompt, because the last\n",
-        "    # token of the prompt will be used to bootstrap decode.\n",
-        "    prefill_token_length = len(token_ids) - 1\n",
-        "\n",
-        "    print(\"Running prefill\")\n",
-        "    kv_cache = self._run_prefill(token_ids[:prefill_token_length])\n",
-        "    # Run decode.\n",
-        "    print(\"Running decode\")\n",
-        "    actual_max_decode_steps = (\n",
-        "        self._max_kv_cache_seq_len - prefill_token_length - 1\n",
-        "    )\n",
-        "    if max_decode_steps is not None:\n",
-        "      actual_max_decode_steps = min(actual_max_decode_steps, max_decode_steps)\n",
-        "    decode_text = self._run_decode(\n",
-        "        prefill_token_length,\n",
-        "        token_ids[prefill_token_length],\n",
-        "        kv_cache,\n",
-        "        actual_max_decode_steps,\n",
-        "    )\n",
-        "    return decode_text"
       ],
       "metadata": {
-        "id": "UBSGrHrM4ANm"
       },
-      "execution_count": 15,
       "outputs": []
     },
     {
@@ -1439,19 +89,8 @@
       "cell_type": "code",
       "source": [
         "# Disclaimer: Model performance demonstrated with the Python API in this notebook is not representative of performance on a local device.\n",
-        "pipeline = LiteRTLlmPipeline(interpreter, tokenizer)"
-      ],
-      "metadata": {
-        "id": "AZhlDQWg61AL"
-      },
-      "execution_count": 16,
-      "outputs": []
-    },
-    {
-      "cell_type": "code",
-      "source": [
         "prompt = \"What is the capital of France?\"\n",
-        "output = pipeline.generate(prompt, max_decode_steps=None)"
       ],
       "metadata": {
         "id": "wT9BIiATkjzL"

     },
     "language_info": {
       "name": "python"
     }
   },
   "cells": [
     {
       "cell_type": "markdown",
       "source": [
+        "# Install Dependencies"
       ],
       "metadata": {
         "id": "39AMoCOa1ckc"
       "metadata": {
         "id": "VoHxuLPu7s37"
       },
       "cell_type": "code",
       "source": [
+        "! wget -q https://github.com/protocolbuffers/protobuf/releases/download/v3.19.0/protoc-3.19.0-linux-x86_64.zip\n",
+        "! unzip -o protoc-3.19.0-linux-x86_64.zip -d /usr/local/"
       ],
+      "outputs": [],
+      "execution_count": null
     },
     {
       "cell_type": "markdown",
       "source": [
+        "## Install LiteRT Pipeline"
       ],
       "metadata": {
+        "id": "qGAaAKzYK5ei"
       }
     },
     {
       "cell_type": "code",
       "source": [
+        "!pip install git+https://github.com/google-ai-edge/ai-edge-apis.git#subdirectory=litert_tools"
       ],
       "metadata": {
+        "id": "43tAeO0AZ7zp"
       },
+      "execution_count": null,
       "outputs": []
     },
     {
       "cell_type": "markdown",
       "source": [
+        "# Create Pipeline from model file"
       ],
       "metadata": {
+        "id": "K5okZCTgYpUd"
       }
     },
     {
       "cell_type": "code",
       "source": [
+        "from litert_tools.pipeline import pipeline\n",
+        "runner = pipeline.load(\"litert-community/Qwen2.5-1.5B-Instruct\", \"Qwen2.5-1.5B-Instruct_seq128_q8_ekv1280.task\")"
       ],
       "metadata": {
+        "id": "3t47HAG2tvc3"
       },
+      "execution_count": null,
       "outputs": []
     },
     {
       "cell_type": "code",
       "source": [
         "# Disclaimer: Model performance demonstrated with the Python API in this notebook is not representative of performance on a local device.\n",
         "prompt = \"What is the capital of France?\"\n",
+        "output = runner.generate(prompt, max_decode_steps=None)"
       ],
       "metadata": {
         "id": "wT9BIiATkjzL"