Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ZhangYuanhan 's Collections
LMM RL
good papers
Vision Language General

Vision Language General

updated Mar 13

Vision Language General

Upvote
-

  • MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

    Paper • 2410.10563 • Published Oct 14, 2024 • 39

  • Latent Action Pretraining from Videos

    Paper • 2410.11758 • Published Oct 15, 2024 • 2

  • TVBench: Redesigning Video-Language Evaluation

    Paper • 2410.07752 • Published Oct 10, 2024 • 6

  • Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

    Paper • 2501.03225 • Published Jan 6 • 7

  • Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

    Paper • 2501.05707 • Published Jan 10 • 20

  • See What You Are Told: Visual Attention Sink in Large Multimodal Models

    Paper • 2503.03321 • Published Mar 5 • 1

  • Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

    Paper • 2503.06749 • Published Mar 9 • 30
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs