circle-guard-bench / README.md
apsys's picture
init
d4d998a
|
raw
history blame
2.23 kB

GuardBench Leaderboard

A HuggingFace leaderboard for the GuardBench project that allows users to submit evaluation results and view the performance of different models on safety guardrails.

Features

  • Display model performance across multiple safety categories
  • Accept JSONL submissions with evaluation results
  • Store submissions in a HuggingFace dataset
  • Secure submission process with token authentication
  • Automatic data refresh from HuggingFace

Setup

  1. Clone this repository
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Create a .env file based on the .env.template:
    cp .env.template .env
    
  4. Edit the .env file with your HuggingFace credentials and settings
  5. Run the application:
    python app.py
    

Submission Format

Submissions should be in JSONL format, with each line containing a JSON object with the following structure:

{
  "model_name": "model-name",
  "per_category_metrics": {
    "Category Name": {
      "default_prompts": {
        "f1_binary": 0.95,
        "recall_binary": 0.93,
        "precision_binary": 1.0,
        "error_ratio": 0.0,
        "avg_runtime_ms": 3000
      },
      "jailbreaked_prompts": { ... },
      "default_answers": { ... },
      "jailbreaked_answers": { ... }
    },
    ...
  },
  "avg_metrics": {
    "default_prompts": {
      "f1_binary": 0.97,
      "recall_binary": 0.95,
      "precision_binary": 1.0,
      "error_ratio": 0.0,
      "avg_runtime_ms": 3000
    },
    "jailbreaked_prompts": { ... },
    "default_answers": { ... },
    "jailbreaked_answers": { ... }
  }
}

Environment Variables

  • HF_TOKEN: Your HuggingFace write token
  • OWNER: Your HuggingFace username or organization
  • RESULTS_DATASET_ID: The ID of the dataset to store results (e.g., "username/guardbench-results")
  • SUBMITTER_TOKEN: A secret token required for submissions
  • ADMIN_USERNAME: Username for admin access to the leaderboard
  • ADMIN_PASSWORD: Password for admin access to the leaderboard

Deployment

This application can be deployed as a HuggingFace Space for public access. Follow the HuggingFace Spaces documentation for deployment instructions.

License

MIT