metadata
license: apache-2.0
datasets:
- lmms-lab/LLaVA-Video-178K
language:
- en
base_model:
- Qwen/Qwen2-VL-7B
tags:
- qwen2_vl
- multimodal
- conversational
Model Card
This model is obtained by fine-tuning Qwen2-VL-7B-Base on LLaVA-Video-178K. It is used as a comparison baseline in LiveCC project.
Performance
Acknowledgement
Joya Chen built the training code, and Yiqi Lin trained the model. The QA evaluation is done by Joya Chen, and CC evaluation is done by Ziyun Zeng. Infra is supported by the company.