MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Paper • 2002.10957 • Published Feb 25, 2020 • 1
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Paper • 2012.15828 • Published Dec 31, 2020 • 1
s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning Paper • 2110.13640 • Published Oct 26, 2021
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts Paper • 2111.02358 • Published Nov 3, 2021 • 1
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers Paper • 2208.06366 • Published Aug 12, 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Paper • 2208.10442 • Published Aug 22, 2022
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 46
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 69
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published Dec 12, 2024 • 30
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 46
Self-Boosting Large Language Models with Synthetic Preference Data Paper • 2410.06961 • Published Oct 9, 2024 • 17
microsoft/beit-base-patch16-224-pt22k-ft22k Image Classification • Updated Feb 27, 2023 • 389k • • 77