circle-guard-bench / README.md
apsys's picture
init
d4d998a
|
raw
history blame
2.23 kB
# GuardBench Leaderboard
A HuggingFace leaderboard for the GuardBench project that allows users to submit evaluation results and view the performance of different models on safety guardrails.
## Features
- Display model performance across multiple safety categories
- Accept JSONL submissions with evaluation results
- Store submissions in a HuggingFace dataset
- Secure submission process with token authentication
- Automatic data refresh from HuggingFace
## Setup
1. Clone this repository
2. Install dependencies:
```
pip install -r requirements.txt
```
3. Create a `.env` file based on the `.env.template`:
```
cp .env.template .env
```
4. Edit the `.env` file with your HuggingFace credentials and settings
5. Run the application:
```
python app.py
```
## Submission Format
Submissions should be in JSONL format, with each line containing a JSON object with the following structure:
```json
{
"model_name": "model-name",
"per_category_metrics": {
"Category Name": {
"default_prompts": {
"f1_binary": 0.95,
"recall_binary": 0.93,
"precision_binary": 1.0,
"error_ratio": 0.0,
"avg_runtime_ms": 3000
},
"jailbreaked_prompts": { ... },
"default_answers": { ... },
"jailbreaked_answers": { ... }
},
...
},
"avg_metrics": {
"default_prompts": {
"f1_binary": 0.97,
"recall_binary": 0.95,
"precision_binary": 1.0,
"error_ratio": 0.0,
"avg_runtime_ms": 3000
},
"jailbreaked_prompts": { ... },
"default_answers": { ... },
"jailbreaked_answers": { ... }
}
}
```
## Environment Variables
- `HF_TOKEN`: Your HuggingFace write token
- `OWNER`: Your HuggingFace username or organization
- `RESULTS_DATASET_ID`: The ID of the dataset to store results (e.g., "username/guardbench-results")
- `SUBMITTER_TOKEN`: A secret token required for submissions
- `ADMIN_USERNAME`: Username for admin access to the leaderboard
- `ADMIN_PASSWORD`: Password for admin access to the leaderboard
## Deployment
This application can be deployed as a HuggingFace Space for public access. Follow the HuggingFace Spaces documentation for deployment instructions.
## License
MIT