GuardBench Leaderboard

A HuggingFace leaderboard for the GuardBench project that allows users to submit evaluation results and view the performance of different models on safety guardrails.

Features

Display model performance across multiple safety categories
Accept JSONL submissions with evaluation results
Store submissions in a HuggingFace dataset
Secure submission process with token authentication
Automatic data refresh from HuggingFace

Setup

Clone this repository
Install dependencies:
```
pip install -r requirements.txt
```
Create a .env file based on the .env.template:
```
cp .env.template .env
```
Edit the .env file with your HuggingFace credentials and settings
Run the application:
```
python app.py
```

Submission Format

Submissions should be in JSONL format, with each line containing a JSON object with the following structure:

{
  "model_name": "model-name",
  "per_category_metrics": {
    "Category Name": {
      "default_prompts": {
        "f1_binary": 0.95,
        "recall_binary": 0.93,
        "precision_binary": 1.0,
        "error_ratio": 0.0,
        "avg_runtime_ms": 3000
      },
      "jailbreaked_prompts": { ... },
      "default_answers": { ... },
      "jailbreaked_answers": { ... }
    },
    ...
  },
  "avg_metrics": {
    "default_prompts": {
      "f1_binary": 0.97,
      "recall_binary": 0.95,
      "precision_binary": 1.0,
      "error_ratio": 0.0,
      "avg_runtime_ms": 3000
    },
    "jailbreaked_prompts": { ... },
    "default_answers": { ... },
    "jailbreaked_answers": { ... }
  }
}

Environment Variables

HF_TOKEN: Your HuggingFace write token
OWNER: Your HuggingFace username or organization
RESULTS_DATASET_ID: The ID of the dataset to store results (e.g., "username/guardbench-results")
SUBMITTER_TOKEN: A secret token required for submissions
ADMIN_USERNAME: Username for admin access to the leaderboard
ADMIN_PASSWORD: Password for admin access to the leaderboard

Deployment

This application can be deployed as a HuggingFace Space for public access. Follow the HuggingFace Spaces documentation for deployment instructions.

License

MIT