hyacinthum
/

Piidgeon-ai4privacy

Model card Files Files and versions Community

Piidgeon-ai4privacy / README.md

hyacinthum's picture

Update README.md

ed236e8 verified 7 months ago

|

history blame contribute delete

2.74 kB

	---
	license: cc-by-nc-4.0
	datasets:
	- ai4privacy/pii-masking-400k
	language:
	- en
	- de
	- fr
	- it
	- es
	- nl
	base_model:
	- iiiorg/piiranha-v1-detect-personal-information
	tags:
	- NeuralWave
	- Hackathon
	---
	## Overview

	This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages.

	---

	## Features

	- Improved Precision: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information.

	- Model Versions:
	- Maximum Accuracy Focus: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial.
	- Maximum Precision Focus: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable.

	---

	## Installation

	To run this model, you will need to install the dependencies:

	```bash
	pip install torch transformers safetensors
	```

	---

	## Usage


	Load and run the model using PyTorch and transformers:

	```python
	from transformers import AutoModelForTokenClassification, AutoConfig, BertTokenizerFast
	from safetensors.torch import load_file

	# Load the config
	config = AutoConfig.from_pretrained("folder_to_model")

	# Initialize the model with the config
	model = AutoModelForTokenClassification.from_config(config)

	# Load the safetensors weights
	state_dict = load_file("folder_to_tensors")

	# Load the state dict into the model
	model.load_state_dict(state_dict)

	# Load the tokenizer
	tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased")

	# Load the label mapper if needed
	with open("pii_model/label_mapper.json", 'r') as f:
	label_mapper_data = json.load(f)

	label_mapper = LabelMapper()
	label_mapper.label_to_id = label_mapper_data['label_to_id']
	label_mapper.id_to_label = {int(k): v for k, v in label_mapper_data['id_to_label'].items()}
	label_mapper.num_labels = label_mapper_data['num_labels']

	# Process outputs for analysis...
	```

	---

	## Evaluation

	- Accuracy Model: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics.
	- Precision Model: Designed to minimize false positives, optimizing for precision-driven applications.

	---

	## Disclaimer
	The publisher of this repository is not affiliated with Ai4Privacy and Ai Suisse SA

	## Honorary Mention
	This repo created during the Hackaton organized by [NeuralWave](https://neuralwave.ch/#/)