TinyLLaVA

arXivGithubDemo

Here, we introduce TinyLLaVA-Qwen2-0.5B-SigLIP , which is trained by the TinyLLaVA Factory codebase. For LLM and vision tower, we choose Qwen2-0.5B and siglip-so400m-patch14-384, respectively.

Usage

Execute the following test code:

model_path = 'Zhang199/TinyLLaVA-Qwen2-0.5B-SigLIP'
prompt = "What are the things I should be cautious about when I visit here?"
image_file = "https://llava-vl.github.io/static/images/view.jpg"
conv_mode = "phi" # or llama, gemma, etc

args = type('Args', (), {
    "model_path": model_path,
    "model": None,
    "query": prompt,
    "conv_mode": conv_mode,
    "image_file": image_file,
    "sep": ",",
    "temperature": 0,
    "top_p": None,
    "num_beams": 1,
    "max_new_tokens": 512
})()

eval_model(args)

Result

model_name vqav2 gqa sqa textvqa MM-VET POPE MME MMMU
LLaVA-1.5-7B 78.5 62.0 66.8 58.2 30.5 85.9 1510.7 -
bczhou/TinyLLaVA-3.1B (our legacy model) 79.9 62.0 69.1 59.1 32.0 86.4 1464.9 -
tinyllava/TinyLLaVA-Gemma-SigLIP-2.4B 78.4 61.6 64.4 53.6 26.9 86.4 1339.0 31.7
tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B 80.1 62.1 73.0 60.3 37.5 87.2 1466.4 38.4
Zhang199/TinyLLaVA-Qwen2-0.5B-SigLIP 72.33 55.84 60.14 45.17 19.5 86.59 1153 29.7

P.S. TinyLLaVA Factory is an open-source modular codebase for small-scale LMMs with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. This code repository provides standard training&evaluating pipelines, flexible data preprocessing&model configurations, and easily extensible architectures. Users can customize their own LMMs with minimal coding effort and less coding mistake.

TinyLLaVA Factory integrates a suite of cutting-edge models and methods.

  • LLM currently supports OpenELM, TinyLlama, StableLM, Qwen, Gemma, Phi, and Qwen2.
  • Vision tower currently supports CLIP, SigLIP, Dino, and combination of CLIP and Dino.
  • Connector currently supports MLP, Qformer, and Resampler.
Downloads last month
100
Safetensors
Model size
1.06B params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Zhang199/TinyLLaVA-Qwen2-0.5B-SigLIP