Create Ops-MM-embedding-v1-7B.json

#58
by frozenc - opened
No description provided.

Hello, I have two questions regarding the benchmark:

  1. In utils_v2.py, I noticed that ActivityNetQA is categorized under V-MRET. Shouldn't it be categorized under V-QA instead?
  2. In the official Vidore-v2 benchmark, the MIT Biomedical, Economics Macro, and ESG Restaurant Synthetic datasets are using Multilingual rather than English-only. Should we consider aligning the Vidore-v2 subtasks?

Thanks for your help!

TIGER-Lab org

Hi, thank your for pointing out the issue! It is a typo and we have fixed that.
We agreed to use the Multilingual version to align with the vidore v2 benchmark.
Thank you for your support!

MINGYISU changed pull request status to merged

Sign up or log in to comment