Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Abstract
Janus-Pro, an enhanced version of Janus, improves multimodal understanding and text-to-image capabilities with an optimized training strategy, expanded data, and increased model size.
In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.
Models citing this paper 12
Browse 12 models citing this paperDatasets citing this paper 0
No dataset linking this paper