PerceptCLIP
/

PerceptCLIP_Emotions

emotion_prediction

computer_vision

perceptual_tasks

Model card Files Files and versions Community

PerceptCLIP_Emotions / README.md

Amitz244's picture

Update README.md

7491b18 verified 3 months ago

|

2.18 kB

	---
	language:
	- en
	base_model:
	- openai/clip-vit-large-patch14
	tags:
	- emotion_prediction
	- VEA
	- computer_vision
	- perceptual_tasks
	- CLIP
	- EmoSet
	---

	PreceptCLIP-Emotions is a model designed to predict the emotions that an image evokes in users. This is the official model from the paper ["Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks"](https://arxiv.org/abs/2503.13260). We apply LoRA adaptation on the CLIP visual encoder with an additional MLP head. Our model achieves state-of-the-art results.

	## Training Details

	- Dataset: [EmoSet](https://vcc.tech/EmoSet)
	- Architecture: CLIP Vision Encoder (ViT-L/14) with LoRA adaptation
	- Loss Function: Cross Entropy Loss
	- Optimizer: AdamW
	- Learning Rate: 0.0001
	- Batch Size: 32

	## Requirements
	- python=3.9.15
	- cudatoolkit=11.7
	- torchvision=0.14.0
	- transformers=4.45.2
	- peft=0.14.0

	## Usage

	To use the model for inference:

	```python
	from torchvision import transforms
	import torch
	from PIL import Image
	from huggingface_hub import hf_hub_download

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Load model
	model_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_Emotions", filename="perceptCLIP_Emotions.pth")
	model = torch.load(model_path).to(device).eval()

	# Emotion label mapping
	idx2label = {
	0: "amusement",
	1: "awe",
	2: "contentment",
	3: "excitement",
	4: "anger",
	5: "disgust",
	6: "fear",
	7: "sadness"
	}

	# Preprocessing function
	def emo_preprocess():
	transform = transforms.Compose([
	transforms.Resize(224),
	transforms.CenterCrop(size=(224, 224)),
	transforms.ToTensor(),
	transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073), std=(0.26862954, 0.26130258, 0.27577711)),
	])
	return transform

	# Load an image
	image = Image.open("image_path.jpg").convert("RGB")
	image = emo_preprocess()(image).unsqueeze(0).to(device)

	# Run inference
	with torch.no_grad():
	outputs = model(image)
	_, predicted = outputs.max(1) # Get the class index

	# Get emotion label
	predicted_emotion = idx2label[predicted.item()]
	print(f"Predicted Emotion: {predicted_emotion}")