B-score: Detecting biases in large language models using response history Paper โข 2505.18545 โข Published May 24 โข 31
Understanding Generative AI Capabilities in Everyday Image Editing Tasks Paper โข 2505.16181 โข Published May 22 โข 24
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance Paper โข 2505.15952 โข Published May 21 โข 20
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper โข 2503.02003 โข Published Mar 3 โข 48
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper โข 2502.09696 โข Published Feb 13 โข 44
Running on Zero 6 6 CLIPSeg - Extract Mask ๐ฆ Identify and mask objects in images using text prompts