Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published 18 days ago • 129
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Apr 28 • 484
Arbitrary-steps Image Super-resolution via Diffusion Inversion Paper • 2412.09013 • Published Dec 12, 2024 • 13
view article Article ArabicWeb24: Creating a High Quality Arabic Web-only Pre-training Dataset By MayFarhat • Aug 8, 2024 • 11
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published Jun 27, 2024 • 48
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13, 2024 • 40