|
# GuardBench Leaderboard |
|
|
|
A HuggingFace leaderboard for the GuardBench project that allows users to submit evaluation results and view the performance of different models on safety guardrails. |
|
|
|
## Features |
|
|
|
- Display model performance across multiple safety categories |
|
- Accept JSONL submissions with evaluation results |
|
- Store submissions in a HuggingFace dataset |
|
- Secure submission process with token authentication |
|
- Automatic data refresh from HuggingFace |
|
|
|
## Setup |
|
|
|
1. Clone this repository |
|
2. Install dependencies: |
|
``` |
|
pip install -r requirements.txt |
|
``` |
|
3. Create a `.env` file based on the `.env.template`: |
|
``` |
|
cp .env.template .env |
|
``` |
|
4. Edit the `.env` file with your HuggingFace credentials and settings |
|
5. Run the application: |
|
``` |
|
python app.py |
|
``` |
|
|
|
## Submission Format |
|
|
|
Submissions should be in JSONL format, with each line containing a JSON object with the following structure: |
|
|
|
```json |
|
{ |
|
"model_name": "model-name", |
|
"per_category_metrics": { |
|
"Category Name": { |
|
"default_prompts": { |
|
"f1_binary": 0.95, |
|
"recall_binary": 0.93, |
|
"precision_binary": 1.0, |
|
"error_ratio": 0.0, |
|
"avg_runtime_ms": 3000 |
|
}, |
|
"jailbreaked_prompts": { ... }, |
|
"default_answers": { ... }, |
|
"jailbreaked_answers": { ... } |
|
}, |
|
... |
|
}, |
|
"avg_metrics": { |
|
"default_prompts": { |
|
"f1_binary": 0.97, |
|
"recall_binary": 0.95, |
|
"precision_binary": 1.0, |
|
"error_ratio": 0.0, |
|
"avg_runtime_ms": 3000 |
|
}, |
|
"jailbreaked_prompts": { ... }, |
|
"default_answers": { ... }, |
|
"jailbreaked_answers": { ... } |
|
} |
|
} |
|
``` |
|
|
|
## Environment Variables |
|
|
|
- `HF_TOKEN`: Your HuggingFace write token |
|
- `OWNER`: Your HuggingFace username or organization |
|
- `RESULTS_DATASET_ID`: The ID of the dataset to store results (e.g., "username/guardbench-results") |
|
- `SUBMITTER_TOKEN`: A secret token required for submissions |
|
- `ADMIN_USERNAME`: Username for admin access to the leaderboard |
|
- `ADMIN_PASSWORD`: Password for admin access to the leaderboard |
|
|
|
## Deployment |
|
|
|
This application can be deployed as a HuggingFace Space for public access. Follow the HuggingFace Spaces documentation for deployment instructions. |
|
|
|
## License |
|
|
|
MIT |
|
|