Category Model Elo # Matches Win vs. Reference (w/ # ratings) Single Image human_verified_reference 1382 5880 --- Single Image llava-a1-predictions 1203 678 35.07% (n=134) Single Image llava13b_output 1095 5420 18.53% (n=475) Single Image mPLUG-Owl prediction 1087 5440 15.83% (n=480) Single Image LlamaAdapter-v2 prediction 1066 5469 14.14% (n=488) Single Image Lynx(8B) predictions 1037 787 11.43% (n=140) Single Image idefics9b_prediction 1020 794 9.72% (n=144) Single Image instruct_blip_output 1000 5469 14.12% (n=503) Single Image otter 962 5443 7.01% (n=499) Single Image visual_gpt_davinci003_output 941 5437 1.57% (n=510) Single Image MiniGPT-4 prediction 926 5448 3.36% (n=506) Single Image Octopus V2 prediction 925 790 8.90% (n=146) Single Image openflamingo 851 5479 2.95% (n=509) Single Image panda_gpt_13b_output 775 5465 2.70% (n=519) Single Image mmgpt_output 731 5471 0.19% (n=527)