--- base_model: - openai/clip-vit-large-patch14 datasets: - ILSVRC/imagenet-1k - mlfoundations/datacomp_small license: mit library_name: transformers pipeline_tag: feature-extraction --- [[Paper]](https://www.arxiv.org/abs/2506.03355)   [[Code]](https://github.com/LIONS-EPFL/LEAF) Model Initialized from `openai/clip-vit-large-patch14`. The image encoder is finetuned with FARE at $\epsilon=2/255$. The text encoder is finetuned with LEAF at $k=1$ with $\rho=50$ and semantic constraints. To load this model use: ```python from transformers import CLIPProcessor, CLIPModel model_name = "LEAF-CLIP/CLIP-ViT-L-rho50-k1-FARE2" processor_name = "openai/clip-vit-large-patch14" model = CLIPModel.from_pretrained(model_name) processor = CLIPProcessor.from_pretrained(processor_name) ```