File size: 4,071 Bytes
87889fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: mit
datasets:
- ai4privacy/open-pii-masking-500k-ai4privacy
language:
- fr
- en
- de
- te
- hi
- it
- es
- nl
base_model:
- answerdotai/ModernBERT-base
library_name: transformers
tags:
- PII
---

## Evaluation Metrics

The table below summarizes the detailed evaluation results per PII label:

| **Label**          | **TP** | **FP** | **FN** | **Accuracy** | **Precision** | **Recall** | **F1 Score** |
|--------------------|:------:|:------:|:------|:------------:|:-------------:|:----------:|:------------:|
| SURNAME            | 3722   | 0      | 28     | 99.25%       | 100.0%        | 99.25%     | 99.63%       |
| O (Non-PII)        | 0      | 400    | 0      | 99.30%       | n/a           | n/a        | n/a          |
| TIME               | 1936   | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%       |
| DRIVERLICENSENUM   | 505    | 0      | 2      | 99.61%       | 100.0%        | 99.61%     | 99.80%       |
| PASSPORTNUM        | 564    | 0      | 2      | 99.65%       | 100.0%        | 99.65%     | 99.82%       |
| GIVENNAME          | 7548   | 0      | 172    | 97.77%       | 100.0%        | 97.77%     | 98.87%       |
| TELEPHONENUM       | 3641   | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%       |
| BUILDINGNUM        | 407    | 0      | 19     | 95.54%       | 100.0%        | 95.54%     | 97.72%       |
| AGE                | 168    | 0      | 1      | 99.41%       | 100.0%        | 99.41%     | 99.70%       |
| DATE               | 2335   | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%       |
| CITY               | 1672   | 0      | 130    | 92.79%       | 100.0%        | 92.79%     | 96.26%       |
| TITLE              | 349    | 0      | 35     | 90.89%       | 100.0%        | 90.89%     | 95.23%       |
| IDCARDNUM          | 1998   | 0      | 22     | 98.91%       | 100.0%        | 98.91%     | 99.45%       |
| GENDER             | 121    | 0      | 0      | 100.0%       | 100.0%        | 100.0%     | 100.0%       |
| CREDITCARDNUMBER   | 557    | 0      | 1      | 99.82%       | 100.0%        | 99.82%     | 99.91%       |
| SEX                | 78     | 0      | 1      | 98.73%       | 100.0%        | 98.73%     | 99.36%       |
| STREET             | 1368   | 0      | 19     | 98.63%       | 100.0%        | 98.63%     | 99.31%       |
| TAXNUM             | 345    | 0      | 12     | 96.64%       | 100.0%        | 96.64%     | 98.29%       |
| EMAIL              | 2606   | 0      | 2      | 99.92%       | 100.0%        | 99.92%     | 99.96%       |
| SOCIALNUM          | 411    | 0      | 11     | 97.39%       | 100.0%        | 97.39%     | 98.68%       |
| ZIPCODE            | 406    | 0      | 20     | 95.31%       | 100.0%        | 95.31%     | 97.60%       |

### Overall Evaluation
- **Accuracy:** 99.01%  
- **Precision:** 98.72%  
- **Recall:** 98.47%  
- **F1 Score:** 98.59%

- **Total True Positives (TP):** 30,737  
- **Total False Positives (FP):** 400  
- **Total False Negatives (FN):** 477  

### Macro-Averaged Metrics
- **Accuracy:** 98.35%  
- **Precision:** 95.24%  
- **Recall:** 93.35%  
- **F1 Score:** 94.29%

---

## Model Behavior & Limitations

- **Evaluation Focus:**  
  The metrics shown above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. Real-world performance may vary and requires additional measures. Feel free to contact support@ai4privacy.com for assistance.

---

## Disclaimer

This model card details the evaluation metrics and fine-tuning parameters for the multilingual anonymiser. **Please note:**  
- The model is provided **as-is** under the MIT License.  
- It is intended solely for redaction purposes and does not perform full PII classification.  
- Users should carefully test and evaluate its performance on their own data before deploying in production environments.

---

*Ai4Privacy – Committed to protecting personal data in the age of AI.*

---