Collections
Discover the best community collections!
Collections including paper arxiv:2401.15947
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 47 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 9 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 28
-
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 17 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
PALO: A Polyglot Large Multimodal Model for 5B People
Paper • 2402.14818 • Published • 25
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 47 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 9 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 28
-
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
The (R)Evolution of Multimodal Large Language Models: A Survey
Paper • 2402.12451 • Published -
deepseek-ai/deepseek-vl-7b-base
7B • Updated • 23.7k • 63 -
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Paper • 2405.11273 • Published • 19
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 90 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 373 -
Progressive Multimodal Reasoning via Active Retrieval
Paper • 2412.14835 • Published • 74 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 72
-
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 76 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Paper • 2311.10122 • Published • 27 -
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Paper • 2311.16103 • Published • 1
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Paper • 2311.07574 • Published • 16 -
MyVLM: Personalizing VLMs for User-Specific Queries
Paper • 2403.14599 • Published • 17
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 29 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1
-
Scaling Vision with Sparse Mixture of Experts
Paper • 2106.05974 • Published • 4 -
Routers in Vision Mixture of Experts: An Empirical Study
Paper • 2401.15969 • Published • 2 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 4 -
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 90 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 373 -
Progressive Multimodal Reasoning via Active Retrieval
Paper • 2412.14835 • Published • 74 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 72
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 47 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 9 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 28
-
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 76 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Paper • 2311.10122 • Published • 27 -
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Paper • 2311.16103 • Published • 1
-
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 17 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
PALO: A Polyglot Large Multimodal Model for 5B People
Paper • 2402.14818 • Published • 25
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Paper • 2311.07574 • Published • 16 -
MyVLM: Personalizing VLMs for User-Specific Queries
Paper • 2403.14599 • Published • 17
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 38 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 47 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 9 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 28
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 29 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1
-
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 54 -
The (R)Evolution of Multimodal Large Language Models: A Survey
Paper • 2402.12451 • Published -
deepseek-ai/deepseek-vl-7b-base
7B • Updated • 23.7k • 63 -
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Paper • 2405.11273 • Published • 19
-
Scaling Vision with Sparse Mixture of Experts
Paper • 2106.05974 • Published • 4 -
Routers in Vision Mixture of Experts: An Empirical Study
Paper • 2401.15969 • Published • 2 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 4 -
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2