File size: 3,530 Bytes

1e8e8e6

Field                                                                                                  | Response
:------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
Intended Application & Domain:                                                                         | Visual Question Answering
Model Type:                                                                                            | Transformer
Intended Users:                                                                                        | Generative AI creators working with conversational AI models and image content.
Output:                                                                                                | Text (Responds to posed question, stateful - remembers previous answers)
Describe how the model works:                                                                          | Chat based on image/video content
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:   | Not Applicable
Technical Limitations:                                                                                 | Max Number of images supported: 4.<br><br>**Context Length:** Supports up to 16,000 tokens total (input + output). If exceeded, input is truncated from the start, and generation ends with an EOS token. Longer prompts may risk performance loss.<br><br>If the model fails (e.g., generates incorrect responses, repeats, or gives poor responses), issues are diagnosed via benchmarks, human review, and internal debugging tools. Only use NVIDIA provided models that use safetensors format. <br><br>Do not expose the vLLM host to a network where any untrusted connections may reach the host. Only use NVIDIA provided models that use safetensors format.
Verified to have met prescribed NVIDIA quality standards:                                              | Yes
Performance Metrics:                                                                                   | MMMU Val with chatGPT as a judge, AI2D, ChartQA Test, InfoVQA Val, OCRBench, OCRBenchV2 English, OCRBenchV2 Chinese, DocVQA val, VideoMME (16 frames), SlideQA (F1)
Potential Known Risks:                                                                                 | The Model may produce output that is biased, toxic, or incorrect responses. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The Model may also generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text, producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.<br>While we have taken safety and security into account and are continuously improving, outputs may still contain political content, misleading information, or unwanted bias beyond our control.
Licensing:                                                                                             | **Governing Terms:**<br>Your use of the software container and model is governed by the [NVIDIA Software and Model Evaluation License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-and-model-evaluation-license/).<br><br>**Additional Information:**<br>[Llama 3.1 Community Model License](https://www.llama.com/llama3_1/license/); Built with Llama.