1 9 11

Yolo Y. Tang

yunlong10

https://yunlong10.github.io/

AI & ML interests

Multimodal Learning, Video Understanding & Generation

Recent Activity

liked a dataset 3 days ago

Xinran0906/CineTechBench

upvoted a paper 10 days ago

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

authored a paper about 2 months ago

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

View all activity

Organizations

None yet

yunlong10's activity

liked a dataset 3 days ago

Xinran0906/CineTechBench

Viewer • Updated 5 days ago • 128 • 166 • 3

upvoted a paper 10 days ago

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Paper • 2505.20426 • Published 12 days ago • 6

authored a paper about 2 months ago

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Paper • 2504.05541 • Published Apr 7 • 16

upvoted a paper about 2 months ago

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Paper • 2504.05541 • Published Apr 7 • 16

liked a dataset about 2 months ago

JunJiaGuo/VidComposition_Benchmark

Viewer • Updated Apr 28 • 2.94k • 300 • 1

authored a paper about 2 months ago

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Paper • 2504.03151 • Published Apr 4 • 14

upvoted a paper about 2 months ago

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Paper • 2504.03151 • Published Apr 4 • 14

upvoted 2 papers 2 months ago

FreSca: Unveiling the Scaling Space in Diffusion Models

Paper • 2504.02154 • Published Apr 2 • 19

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

Paper • 2504.01014 • Published Apr 1 • 70

liked a dataset 3 months ago

jing-bi/verify-teaser

Viewer • Updated Mar 17 • 50 • 171 • 4

authored 2 papers 3 months ago

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

Paper • 2503.11557 • Published Mar 14 • 21

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Paper • 2412.18108 • Published Dec 24, 2024

upvoted a paper 3 months ago

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

Paper • 2503.11557 • Published Mar 14 • 21

liked a Space 3 months ago

VidComposition

🥇

Duplicate this leaderboard to initialize your own!

upvoted a paper 5 months ago

Generative AI for Cel-Animation: A Survey

Paper • 2501.06250 • Published Jan 8 • 13

authored 3 papers 5 months ago

commented a paper 5 months ago

Generative AI for Cel-Animation: A Survey

Paper • 2501.06250 • Published Jan 8 • 13 •

authored a paper 8 months ago

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

Paper • 2305.02677 • Published May 4, 2023