BAAI
/

3v324v23 commited on
Commit
d1ed198
·
1 Parent(s): e3bf9ba
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -28,7 +28,7 @@ TODO
28
  **Tips: Our inference code still under updating, you could update it by assign "--include '\*.py'" in huggingface-cli to only update the inference code, avoid downloading the whole model.*
29
 
30
  ---
31
- ### w/o. efficiency optimization
32
  ```python
33
  from transformers import AutoTokenizer, AutoModel, AutoConfig, BitsAndBytesConfig
34
  import torch
@@ -66,7 +66,7 @@ print(response)
66
  ```
67
 
68
  ---
69
- ### w. chunk-based prefill
70
  Chunk-based prefill significantly reduces memory demands and response latency by encoding video input in a streaming manner. This advantage becomes particularly noticeable with longer videos.
71
 
72
  To enable this mode, you need to set `enable_chunk_prefill` to `True` and configure the `prefill_config` parameters:
@@ -130,7 +130,7 @@ print(response)
130
  ```
131
 
132
  ---
133
- ### w. chunk-based prefill & bi-level kvs decoding
134
  coming soon
135
  ```python
136
 
 
28
  **Tips: Our inference code still under updating, you could update it by assign "--include '\*.py'" in huggingface-cli to only update the inference code, avoid downloading the whole model.*
29
 
30
  ---
31
+ ### 1. Inference w/o. Efficiency Optimization
32
  ```python
33
  from transformers import AutoTokenizer, AutoModel, AutoConfig, BitsAndBytesConfig
34
  import torch
 
66
  ```
67
 
68
  ---
69
+ ### 2. Inference w. Chunk-based Pre-filling
70
  Chunk-based prefill significantly reduces memory demands and response latency by encoding video input in a streaming manner. This advantage becomes particularly noticeable with longer videos.
71
 
72
  To enable this mode, you need to set `enable_chunk_prefill` to `True` and configure the `prefill_config` parameters:
 
130
  ```
131
 
132
  ---
133
+ ### 3. Inference w. Chunk-based Pre-filling & Bi-level KVs Decoding
134
  coming soon
135
  ```python
136