Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
On the other hand, BEiT is a vision model pretrained on a masked image modeling task which masks some of the image patches and requires the model to predict the masked patches (similar to the masked language modeling objective).