amalad commited on
Commit
1fbc10a
·
1 Parent(s): 67c7cea

Update README and examples

Browse files
Files changed (2) hide show
  1. README.md +10 -4
  2. examples.py +1 -1
README.md CHANGED
@@ -5,13 +5,13 @@ license_link: >-
5
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
6
  ---
7
 
8
- # Llama-Nemotron-Nano-VL-8B-V1
9
 
10
  ## Model Overview
11
 
12
  ### Description
13
 
14
- Llama-Nemotron-Nano-VL-8B-V1 is a leading document intelligence vision language model (VLMs) that enables the ability to query and summarize images and video from the physical or virtual world. Llama-Nemotron-Nano-VL-8B-V1 is deployable in the data center, cloud and at the edge, including Jetson Orin and laptop by AWQ 4bit quantization through TinyChat framework. We find: (1) image-text pairs are not enough, interleaved image-text is essential; (2) unfreezing LLM during interleaved image-text pre-training enables in-context learning; (3)re-blending text-only instruction data is crucial to boost both VLM and text-only performance.
15
 
16
  This model was trained on commercial images and videos for all three stages of training and supports single image and video inference.
17
 
@@ -94,13 +94,19 @@ Supported Operating System(s): Linux<br>
94
  ### Model Versions:
95
  Llama-3.1-Nemotron-Nano-VL-8B-V1
96
 
97
- ## Usage
98
 
 
 
 
 
 
 
99
  ```python
100
  from PIL import Image
101
  from transformers import AutoImageProcessor, AutoModel, AutoTokenizer
102
 
103
- path = "nvidia/Llama-Nemotron-Nano-VL-8B-V1"
104
  model = AutoModel.from_pretrained(path, trust_remote_code=True, device_map="cuda").eval()
105
  tokenizer = AutoTokenizer.from_pretrained(path)
106
  image_processor = AutoImageProcessor.from_pretrained(path, trust_remote_code=True, device="cuda")
 
5
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
6
  ---
7
 
8
+ # Llama-3.1-Nemotron-Nano-VL-8B-V1
9
 
10
  ## Model Overview
11
 
12
  ### Description
13
 
14
+ Llama Nemotron Nano VL is a leading document intelligence vision language model (VLMs) that enables the ability to query and summarize images and video from the physical or virtual world. Llama Nemotron Nano VL is deployable in the data center, cloud and at the edge, including Jetson Orin and laptop by AWQ 4bit quantization through TinyChat framework. We find: (1) image-text pairs are not enough, interleaved image-text is essential; (2) unfreezing LLM during interleaved image-text pre-training enables in-context learning; (3)re-blending text-only instruction data is crucial to boost both VLM and text-only performance.
15
 
16
  This model was trained on commercial images and videos for all three stages of training and supports single image and video inference.
17
 
 
94
  ### Model Versions:
95
  Llama-3.1-Nemotron-Nano-VL-8B-V1
96
 
97
+ ## Quick Start
98
 
99
+ ### Install Dependencies
100
+ ```
101
+ pip install transformers accelerate timm einops open-clip-torch
102
+ ```
103
+
104
+ ### Usage
105
  ```python
106
  from PIL import Image
107
  from transformers import AutoImageProcessor, AutoModel, AutoTokenizer
108
 
109
+ path = "nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1"
110
  model = AutoModel.from_pretrained(path, trust_remote_code=True, device_map="cuda").eval()
111
  tokenizer = AutoTokenizer.from_pretrained(path)
112
  image_processor = AutoImageProcessor.from_pretrained(path, trust_remote_code=True, device="cuda")
examples.py CHANGED
@@ -2,7 +2,7 @@ import torch
2
  from transformers import AutoTokenizer, AutoModel, AutoImageProcessor
3
  from PIL import Image
4
 
5
- path = "nvidia/Llama-Nemotron-Nano-VL-8B-V1"
6
  model = AutoModel.from_pretrained(
7
  path,
8
  torch_dtype=torch.bfloat16,
 
2
  from transformers import AutoTokenizer, AutoModel, AutoImageProcessor
3
  from PIL import Image
4
 
5
+ path = "nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1"
6
  model = AutoModel.from_pretrained(
7
  path,
8
  torch_dtype=torch.bfloat16,