GuardBench Leaderboard
A HuggingFace leaderboard for the GuardBench project that allows users to submit evaluation results and view the performance of different models on safety guardrails.
Features
- Display model performance across multiple safety categories
- Accept JSONL submissions with evaluation results
- Store submissions in a HuggingFace dataset
- Secure submission process with token authentication
- Automatic data refresh from HuggingFace
Setup
- Clone this repository
- Install dependencies:
pip install -r requirements.txt
- Create a
.env
file based on the.env.template
:cp .env.template .env
- Edit the
.env
file with your HuggingFace credentials and settings - Run the application:
python app.py
Submission Format
Submissions should be in JSONL format, with each line containing a JSON object with the following structure:
{
"model_name": "model-name",
"per_category_metrics": {
"Category Name": {
"default_prompts": {
"f1_binary": 0.95,
"recall_binary": 0.93,
"precision_binary": 1.0,
"error_ratio": 0.0,
"avg_runtime_ms": 3000
},
"jailbreaked_prompts": { ... },
"default_answers": { ... },
"jailbreaked_answers": { ... }
},
...
},
"avg_metrics": {
"default_prompts": {
"f1_binary": 0.97,
"recall_binary": 0.95,
"precision_binary": 1.0,
"error_ratio": 0.0,
"avg_runtime_ms": 3000
},
"jailbreaked_prompts": { ... },
"default_answers": { ... },
"jailbreaked_answers": { ... }
}
}
Environment Variables
HF_TOKEN
: Your HuggingFace write tokenOWNER
: Your HuggingFace username or organizationRESULTS_DATASET_ID
: The ID of the dataset to store results (e.g., "username/guardbench-results")SUBMITTER_TOKEN
: A secret token required for submissionsADMIN_USERNAME
: Username for admin access to the leaderboardADMIN_PASSWORD
: Password for admin access to the leaderboard
Deployment
This application can be deployed as a HuggingFace Space for public access. Follow the HuggingFace Spaces documentation for deployment instructions.
License
MIT