These are the models that we have used with our first version.
-
google/owlv2-base-patch16-ensemble
Zero-Shot Object Detection • Updated • 2.67M • 103 -
Qwen/Qwen1.5-0.5B-Chat
Text Generation • Updated • 430k • 81 -
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2.83M • 726 -
facebook/sam-vit-base
Feature Extraction • Updated • 303k • • 138