A newer version of the Gradio SDK is available:
5.34.2
title: Frontier AI Cybersecurity Observatory
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0
tags:
- leaderboard
short_description: Cybersecurity Capability Evaluation Results Collection
sdk_version: 4.44.1
Tracking AI capabilities in cybersecurity is essential for understanding emerging impacts and risks. Our Frontier AI Cybersecurity Observatory provides a centralized platform that aggregates relevant benchmarks, enabling the community to more easily monitor and assess the evolving cybersecurity capabilities of AI systems.
Submit your benchmark
Please follow the steps below to add your benchmark.
- First you need to add your results in results.json. Under the top-level "results" key, you need to insert an entry that looks like this:
"Your Benchmark Name": {
"Metric Name 1": {
"Model / Agent Name": [value]
},
"Metric Name 2": {
"Model / Agent Name": [value]
}
}
Here, if you want, you can add multiple metric scores.
- Then, add descriptive metadata in meta_data.py
LEADERBOARD_MD["Your Benchmark Name"] = """
Brief description of what the benchmark measures.
Paper: <paper URL>
Code: <repository URL>
"""
- Lastly, please open a pull request. You need to commit your changes and open a PR against this repository. We will review and merge submissions. If you have any questions, please contact Yujin Potter at yujinyujin9393@gmail.com.
Paper & Blog
Paper: https://arxiv.org/abs/2504.05408
Blog: https://rdi.berkeley.edu/frontier-ai-impact-on-cybersecurity/
Survey
We're also launching an expert survey on this topic. We invite all AI and security researchers and practitioners to take the survey here: https://berkeley.qualtrics.com/jfe/form/SV_3Ozd2BPCEvRea1w
Citation
Please consider to cite the report if the resource is useful to your research:
@article{guo2025sok,
title={{Frontier AI's Impact on the Cybersecurity Landscape}},
author={Guo, Wenbo and Potter, Yujin and Shi, Tianneng and Wang, Zhun and Zhang, Andy and Song, Dawn},
journal={arXiv preprint arXiv:2504.05408},
year={2025}
}