Tether Multilabel Abuse Detection (v4)

This model is part of the Tether project — an AI-driven tool designed to identify emotional abuse patterns in text communication, including gaslighting, control, insults, projection, and more. It is built for use in survivor-facing tools, clinician review workflows, and law enforcement risk triage pilots.

🧠 Model Overview

Architecture: RoBERTa-base + multi-label classification head
Trained on: ~2,000 labeled abuse/non-abuse message examples
Labels (12 total):
- blame shifting
- contradictory statements
- control
- dismissiveness
- gaslighting
- guilt tripping
- insults
- obscure language
- projection
- recovery phase
- nonabusive
- is_from_me (optional metadata, not always used in deployment)

🧪 Performance (Eval Set)

Label	F1 Score
blame shifting	0.84
contradictory statements	0.46
control	0.75
dismissiveness	0.68
gaslighting	0.56
guilt tripping	0.62
insults	0.71
obscure language	0.66
projection	0.81
recovery phase	0.55
nonabusive	0.54
is_from_me	0.84

Macro F1: 0.67
Samples avg F1: 0.66

🛡️ Intended Use

This model supports:

Real-time abuse detection in chat/text-based systems
Therapist/case worker reflection tools
Risk triage for domestic violence investigations
Educational applications for identifying coercive or emotionally abusive behavior

It is not a replacement for legal judgement or clinical diagnosis.

⚠️ Known Limitations

May underperform with:
- Highly poetic/metaphorical language
- Sarcasm and irony
- Non-English texts
gaslighting and recovery phase labels show moderate performance and may require human review
The is_from_me label is metadata used for internal modeling but may be excluded from production use

🧩 Technical Details

Fine-tuned using BCEWithLogitsLoss + per-label pos_weight
Thresholds per label selected using macro-F1 sweep (0.1–0.9)
Temperature scaling applied for probability calibration
Inference latency: ~XX ms/message (on GPU) (you can fill this in)

💬 Example Inference

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("SamanthaStorm/tether-multilabel-v4")
tokenizer = AutoTokenizer.from_pretrained("SamanthaStorm/tether-multilabel-v4")

text = "You're making things up again — I never said that."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.sigmoid(logits)

print(probs)

SamanthaStorm
/

tether-multilabel-v3