Upload 16 files

Browse files

Files changed (16) hide show

vit-base-nsfw-detector/.gitattributes +35 -0
vit-base-nsfw-detector/README.md +166 -0
vit-base-nsfw-detector/config.json +32 -0
vit-base-nsfw-detector/confusion_matrix.png +0 -0
vit-base-nsfw-detector/model.safetensors +3 -0
vit-base-nsfw-detector/onnx/config.json +31 -0
vit-base-nsfw-detector/onnx/model.onnx +3 -0
vit-base-nsfw-detector/onnx/model_bnb4.onnx +3 -0
vit-base-nsfw-detector/onnx/model_fp16.onnx +3 -0
vit-base-nsfw-detector/onnx/model_int8.onnx +3 -0
vit-base-nsfw-detector/onnx/model_q4.onnx +3 -0
vit-base-nsfw-detector/onnx/model_q4f16.onnx +3 -0
vit-base-nsfw-detector/onnx/model_quantized.onnx +3 -0
vit-base-nsfw-detector/onnx/model_uint8.onnx +3 -0
vit-base-nsfw-detector/onnx/preprocessor_config.json +22 -0
vit-base-nsfw-detector/preprocessor_config.json +11 -0

vit-base-nsfw-detector/.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

vit-base-nsfw-detector/README.md ADDED Viewed

	@@ -0,0 +1,166 @@

+---
+metrics:
+- accuracy
+pipeline_tag: image-classification
+base_model: google/vit-base-patch16-384
+model-index:
+- name: AdamCodd/vit-base-nsfw-detector
+  results:
+  - task:
+      type: image-classification
+      name: Image Classification
+    metrics:
+    - type: accuracy
+      value: 0.9654
+      name: Accuracy
+    - type: AUC
+      value: 0.9948
+    - type: loss
+      value: 0.0937
+      name: Loss
+license: apache-2.0
+tags:
+- transformers.js
+- transformers
+- nlp
+---
+# vit-base-nsfw-detector
+This model is a fine-tuned version of [vit-base-patch16-384](https://huggingface.co/google/vit-base-patch16-384) on around 25_000 images (drawings, photos...).
+It achieves the following results on the evaluation set:
+- Loss: 0.0937
+- Accuracy: 0.9654
+**<u>New [07/30]</u>**: I created a new ViT model specifically to detect NSFW/SFW images for stable diffusion usage (read the disclaimer below for the reason): [**AdamCodd/vit-nsfw-stable-diffusion**](https://huggingface.co/AdamCodd/vit-nsfw-stable-diffusion).
+**Disclaimer**: This model wasn't made with generative images in mind! There is no generated image in the dataset used here, and it performs significantly worse on generative images, which will require another ViT model specifically trained on generative images. Here are the model's actual scores for generative images to give you an idea:
+- Loss: 0.3682 (↑ 292.95%)
+- Accuracy: 0.8600 (↓ 10.91%)
+- F1: 0.8654
+- AUC: 0.9376 (↓ 5.75%)
+- Precision: 0.8350
+- Recall: 0.8980
+## Model description
+The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384.
+## Intended uses & limitations
+There are two classes: SFW and NSFW. The model has been trained to be restrictive and therefore classify "sexy" images as NSFW. That is, if the image shows cleavage or too much skin, it will be classified as NSFW. This is normal.
+Usage for a local image:
+```python
+from transformers import pipeline
+from PIL import Image
+img = Image.open("<path_to_image_file>")
+predict = pipeline("image-classification", model="AdamCodd/vit-base-nsfw-detector")
+predict(img)
+```
+Usage for a distant image:
+```python
+from transformers import ViTImageProcessor, AutoModelForImageClassification
+from PIL import Image
+import requests
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+processor = ViTImageProcessor.from_pretrained('AdamCodd/vit-base-nsfw-detector')
+model = AutoModelForImageClassification.from_pretrained('AdamCodd/vit-base-nsfw-detector')
+inputs = processor(images=image, return_tensors="pt")
+outputs = model(**inputs)
+logits = outputs.logits
+predicted_class_idx = logits.argmax(-1).item()
+print("Predicted class:", model.config.id2label[predicted_class_idx])
+# Predicted class: sfw
+```
+Usage with Transformers.js (Vanilla JS):
+```js
+/* Instructions:
+* - Place this script in an HTML file using the <script type="module"> tag.
+* - Ensure the HTML file is served over a local or remote server (e.g., using Python's http.server, Node.js server, or similar).
+* - Replace 'https://example.com/path/to/image.jpg' in the classifyImage function call with the URL of the image you want to classify.
+*
+* Example of how to include this script in HTML:
+* <script type="module" src="path/to/this_script.js"></script>
+*
+* This setup ensures that the script can use imports and perform network requests without CORS issues.
+*/
+import { pipeline, env } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.1';
+// Since we will download the model from HuggingFace Hub, we can skip the local model check
+env.allowLocalModels = false;
+// Load the image classification model
+const classifier = await pipeline('image-classification', 'AdamCodd/vit-base-nsfw-detector');
+// Function to fetch and classify an image from a URL
+async function classifyImage(url) {
+  try {
+    const response = await fetch(url);
+    if (!response.ok) throw new Error('Failed to load image');
+    const blob = await response.blob();
+    const image = new Image();
+    const imagePromise = new Promise((resolve, reject) => {
+      image.onload = () => resolve(image);
+      image.onerror = reject;
+      image.src = URL.createObjectURL(blob);
+    });
+    const img = await imagePromise; // Ensure the image is loaded
+    const classificationResults = await classifier([img.src]); // Classify the image
+    console.log('Predicted class: ', classificationResults[0].label);
+  } catch (error) {
+    console.error('Error classifying image:', error);
+  }
+}
+// Example usage
+classifyImage('https://example.com/path/to/image.jpg');
+// Predicted class: sfw
+```
+The model has been trained on a variety of images (realistic, 3D, drawings), yet it is not perfect and some images may be wrongly classified as NSFW when they are not. Additionally, please note that using the quantized ONNX model within the transformers.js pipeline will slightly reduce the model's accuracy.
+You can find a toy implementation of this model with Transformers.js [here](https://github.com/AdamCodd/media-random-generator).
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 3e-05
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- num_epochs: 1
+### Training results
+- Validation Loss: 0.0937
+- Accuracy: 0.9654,
+- AUC: 0.9948
+[Confusion matrix](https://huggingface.co/AdamCodd/vit-base-nsfw-detector/resolve/main/confusion_matrix.png) (eval):
+[1076   37]
+[  60 1627]
+### Framework versions
+- Transformers 4.36.2
+- Evaluate 0.4.1
+If you want to support me, you can [here](https://ko-fi.com/adamcodd).

vit-base-nsfw-detector/config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "_name_or_path": "AdamCodd/vit-nsfw-detection",
+  "architectures": [
+    "ViTForImageClassification"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "encoder_stride": 16,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "sfw",
+    "1": "nsfw"
+  },
+  "image_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "sfw": "0",
+    "nsfw": "1"
+  },
+  "layer_norm_eps": 1e-12,
+  "model_type": "vit",
+  "num_attention_heads": 12,
+  "num_channels": 3,
+  "num_hidden_layers": 12,
+  "patch_size": 16,
+  "problem_type": "single_label_classification",
+  "qkv_bias": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.36.2"
+}

vit-base-nsfw-detector/confusion_matrix.png ADDED Viewed

vit-base-nsfw-detector/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:266efb8bf67c1e865a577222fbbd6ddb149b9e00ba0d2b50466a034837f026a4
+size 344391328

vit-base-nsfw-detector/onnx/config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_name_or_path": "AdamCodd/vit-base-nsfw-detector",
+  "architectures": [
+    "ViTForImageClassification"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "encoder_stride": 16,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "sfw",
+    "1": "nsfw"
+  },
+  "image_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "nsfw": "1",
+    "sfw": "0"
+  },
+  "layer_norm_eps": 1e-12,
+  "model_type": "vit",
+  "num_attention_heads": 12,
+  "num_channels": 3,
+  "num_hidden_layers": 12,
+  "patch_size": 16,
+  "problem_type": "single_label_classification",
+  "qkv_bias": true,
+  "transformers_version": "4.34.0"
+}

vit-base-nsfw-detector/onnx/model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dce8f5af8509fee39c453b78a66076ead5c97321ddcee0ddfa16f67dc8286384
+size 344569044

vit-base-nsfw-detector/onnx/model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5a5e12ed3400c597d96a09d4a28db30e77a2f2f991214d3b3fbce82403a2b204
+size 52617366

vit-base-nsfw-detector/onnx/model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e9693884a06af5d9bef115f730e128deeb39c1850a3c43c21f3f49103d32a77f
+size 172385122

vit-base-nsfw-detector/onnx/model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d25aa73fe1eec78459e35ff911e2af98f652ee919b48d9c54316c86d5ff435fa
+size 88500985

vit-base-nsfw-detector/onnx/model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a76b59a27ffa5426682841f916e27f801690627f9193bab1d35bd54b6e32cd61
+size 57925254

vit-base-nsfw-detector/onnx/model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8402aaf2e181980706e1e264727ca9c3b65cc96bd715d6969e75da4010f8b734
+size 50302325

vit-base-nsfw-detector/onnx/model_quantized.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:432763a6899ebc418c55f784b98f90565e5fc694c778d2ffcb0294b12f6a7404
+size 88500985

vit-base-nsfw-detector/onnx/model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:432763a6899ebc418c55f784b98f90565e5fc694c778d2ffcb0294b12f6a7404
+size 88500985

vit-base-nsfw-detector/onnx/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "ViTFeatureExtractor",
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 384,
+    "width": 384
+  }
+}

vit-base-nsfw-detector/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+    "do_normalize": true,
+    "do_resize": true,
+    "image_mean": [0.5, 0.5, 0.5],
+    "image_processor_type": "ViTImageProcessor",
+    "image_std": [0.5, 0.5, 0.5],
+    "size": {
+      "height": 384,
+      "width": 384
+    }
+  }