metadata

license: apache-2.0
language:
  - en
base_model:
  - meta-llama/Llama-3.2-11B-Vision-Instruct
datasets:
  - Xkev/LLaVA-CoT-100k
pipeline_tag: image-text-to-text
library_name: transformers

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Introduction

Sherlock is a training framework focus on improving Vision-Language Models reasoning and self-correction capabilities.

GitHub repo: https://github.com/DripNowhy/Sherlock

Project Page: https://dripnowhy.github.io/Sherlock/

arXiv: https://arxiv.org/abs/2505.22651