ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use Paper • 2504.07981 • Published Apr 4 • 2
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges Paper • 2411.18932 • Published Nov 28, 2024 • 1
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 22
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published Jun 11, 2024 • 38
CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification Paper • 2405.00253 • Published Apr 30, 2024
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models Paper • 2405.00390 • Published May 1, 2024
Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models Paper • 2401.13298 • Published Jan 24, 2024
MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems Paper • 2404.09486 • Published Apr 15, 2024 • 1
Positional Artefacts Propagate Through Masked Language Model Embeddings Paper • 2011.04393 • Published Nov 9, 2020 • 1
Augmented Large Language Models with Parametric Knowledge Guiding Paper • 2305.04757 • Published May 8, 2023 • 2
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 43