File size: 4,071 Bytes
87889fd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
license: mit
datasets:
- ai4privacy/open-pii-masking-500k-ai4privacy
language:
- fr
- en
- de
- te
- hi
- it
- es
- nl
base_model:
- answerdotai/ModernBERT-base
library_name: transformers
tags:
- PII
---
## Evaluation Metrics
The table below summarizes the detailed evaluation results per PII label:
| **Label** | **TP** | **FP** | **FN** | **Accuracy** | **Precision** | **Recall** | **F1 Score** |
|--------------------|:------:|:------:|:------|:------------:|:-------------:|:----------:|:------------:|
| SURNAME | 3722 | 0 | 28 | 99.25% | 100.0% | 99.25% | 99.63% |
| O (Non-PII) | 0 | 400 | 0 | 99.30% | n/a | n/a | n/a |
| TIME | 1936 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
| DRIVERLICENSENUM | 505 | 0 | 2 | 99.61% | 100.0% | 99.61% | 99.80% |
| PASSPORTNUM | 564 | 0 | 2 | 99.65% | 100.0% | 99.65% | 99.82% |
| GIVENNAME | 7548 | 0 | 172 | 97.77% | 100.0% | 97.77% | 98.87% |
| TELEPHONENUM | 3641 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
| BUILDINGNUM | 407 | 0 | 19 | 95.54% | 100.0% | 95.54% | 97.72% |
| AGE | 168 | 0 | 1 | 99.41% | 100.0% | 99.41% | 99.70% |
| DATE | 2335 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
| CITY | 1672 | 0 | 130 | 92.79% | 100.0% | 92.79% | 96.26% |
| TITLE | 349 | 0 | 35 | 90.89% | 100.0% | 90.89% | 95.23% |
| IDCARDNUM | 1998 | 0 | 22 | 98.91% | 100.0% | 98.91% | 99.45% |
| GENDER | 121 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
| CREDITCARDNUMBER | 557 | 0 | 1 | 99.82% | 100.0% | 99.82% | 99.91% |
| SEX | 78 | 0 | 1 | 98.73% | 100.0% | 98.73% | 99.36% |
| STREET | 1368 | 0 | 19 | 98.63% | 100.0% | 98.63% | 99.31% |
| TAXNUM | 345 | 0 | 12 | 96.64% | 100.0% | 96.64% | 98.29% |
| EMAIL | 2606 | 0 | 2 | 99.92% | 100.0% | 99.92% | 99.96% |
| SOCIALNUM | 411 | 0 | 11 | 97.39% | 100.0% | 97.39% | 98.68% |
| ZIPCODE | 406 | 0 | 20 | 95.31% | 100.0% | 95.31% | 97.60% |
### Overall Evaluation
- **Accuracy:** 99.01%
- **Precision:** 98.72%
- **Recall:** 98.47%
- **F1 Score:** 98.59%
- **Total True Positives (TP):** 30,737
- **Total False Positives (FP):** 400
- **Total False Negatives (FN):** 477
### Macro-Averaged Metrics
- **Accuracy:** 98.35%
- **Precision:** 95.24%
- **Recall:** 93.35%
- **F1 Score:** 94.29%
---
## Model Behavior & Limitations
- **Evaluation Focus:**
The metrics shown above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. Real-world performance may vary and requires additional measures. Feel free to contact support@ai4privacy.com for assistance.
---
## Disclaimer
This model card details the evaluation metrics and fine-tuning parameters for the multilingual anonymiser. **Please note:**
- The model is provided **as-is** under the MIT License.
- It is intended solely for redaction purposes and does not perform full PII classification.
- Users should carefully test and evaluate its performance on their own data before deploying in production environments.
---
*Ai4Privacy – Committed to protecting personal data in the age of AI.*
--- |