zhiminy commited on
Commit
f0e3716
·
1 Parent(s): 3cb0aff
Files changed (2) hide show
  1. README.md +6 -6
  2. app.py +4 -4
README.md CHANGED
@@ -11,9 +11,9 @@ pinned: false
11
  short_description: The chatbot arena for software engineering
12
  ---
13
 
14
- # SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering
15
 
16
- Welcome to **SE Arena**, an open-source platform designed for evaluating software engineering-focused foundation models (FMs), particularly large language models (LLMs). SE Arena benchmarks models in iterative, context-rich workflows that are characteristic of software engineering (SE) tasks.
17
 
18
  ## Key Features
19
 
@@ -26,9 +26,9 @@ Welcome to **SE Arena**, an open-source platform designed for evaluating softwar
26
  - **Consistency score**: Quantify model determinism and reliability through self-play matches
27
  - **Transparent, Open-Source Leaderboard**: View real-time model rankings across diverse SE workflows with full transparency.
28
 
29
- ## Why SE Arena?
30
 
31
- Existing evaluation frameworks (like Chatbot Arena, WebDev Arena, and Copilot Arena) often don't address the complex, iterative nature of SE tasks. SE Arena fills critical gaps by:
32
 
33
  - Supporting context-rich, multi-turn evaluations to capture iterative workflows
34
  - Integrating repository-level context through RepoChat to simulate real-world development scenarios
@@ -51,7 +51,7 @@ Existing evaluation frameworks (like Chatbot Arena, WebDev Arena, and Copilot Ar
51
 
52
  ### Usage
53
 
54
- 1. Navigate to the [SE Arena platform](https://huggingface.co/spaces/SE-Arena/Software-Engineering-Arena)
55
  2. Sign in with your Hugging Face account
56
  3. Enter your SE task prompt (optionally include a repository URL for RepoChat)
57
  4. Engage in multi-round interactions and vote on model performance
@@ -66,7 +66,7 @@ We welcome contributions from the community! Here's how you can help:
66
 
67
  ## Privacy Policy
68
 
69
- Your interactions are anonymized and used solely for improving SE Arena and FM benchmarking. By using SE Arena, you agree to our Terms of Service.
70
 
71
  ## Future Plans
72
 
 
11
  short_description: The chatbot arena for software engineering
12
  ---
13
 
14
+ # SWE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering
15
 
16
+ Welcome to **SWE Arena**, an open-source platform designed for evaluating software engineering-focused foundation models (FMs), particularly large language models (LLMs). SWE Arena benchmarks models in iterative, context-rich workflows that are characteristic of software engineering (SE) tasks.
17
 
18
  ## Key Features
19
 
 
26
  - **Consistency score**: Quantify model determinism and reliability through self-play matches
27
  - **Transparent, Open-Source Leaderboard**: View real-time model rankings across diverse SE workflows with full transparency.
28
 
29
+ ## Why SWE Arena?
30
 
31
+ Existing evaluation frameworks (like Chatbot Arena, WebDev Arena, and Copilot Arena) often don't address the complex, iterative nature of SE tasks. SWE Arena fills critical gaps by:
32
 
33
  - Supporting context-rich, multi-turn evaluations to capture iterative workflows
34
  - Integrating repository-level context through RepoChat to simulate real-world development scenarios
 
51
 
52
  ### Usage
53
 
54
+ 1. Navigate to the [SWE Arena platform](https://huggingface.co/spaces/SE-Arena/Software-Engineering-Arena)
55
  2. Sign in with your Hugging Face account
56
  3. Enter your SE task prompt (optionally include a repository URL for RepoChat)
57
  4. Engage in multi-round interactions and vote on model performance
 
66
 
67
  ## Privacy Policy
68
 
69
+ Your interactions are anonymized and used solely for improving SWE Arena and FM benchmarking. By using SWE Arena, you agree to our Terms of Service.
70
 
71
  ## Future Plans
72
 
app.py CHANGED
@@ -561,7 +561,7 @@ with gr.Blocks() as app:
561
  leaderboard_intro = gr.Markdown(
562
  """
563
  # 🏆 FM4SE Leaderboard: Community-Driven Evaluation of Top Foundation Models (FMs) in Software Engineering (SE) Tasks
564
- The SE Arena is an open-source platform designed to evaluate foundation models through human preference, fostering transparency and collaboration. This platform aims to empower the SE community to assess and compare the performance of leading FMs in related tasks. For technical details, check out our [paper](https://arxiv.org/abs/2502.01860).
565
  """,
566
  elem_classes="leaderboard-intro",
567
  )
@@ -590,10 +590,10 @@ with gr.Blocks() as app:
590
  # Add a citation block in Markdown
591
  citation_component = gr.Markdown(
592
  """
593
- Made with ❤️ for SE Arena. If this work is useful to you, please consider citing:
594
  ```
595
  @inproceedings{zhao2025se,
596
- title={SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering},
597
  author={Zhao, Zhimin},
598
  booktitle={ACM international conference on AI Foundation Models and Software Engineering},
599
  year={2025}}
@@ -604,7 +604,7 @@ with gr.Blocks() as app:
604
  # Add title and description as a Markdown component
605
  arena_intro = gr.Markdown(
606
  f"""
607
- # ⚔️ SE Arena: Explore and Test Top FMs with SE Tasks by Community Voting
608
 
609
  ## 📜How It Works
610
  - **Blind Comparison**: Submit a SE-related query to two anonymous FMs randomly selected from up to {len(available_models)} top models from OpenAI, Gemini, Grok, Claude, Deepseek, Qwen, Llama, Mistral, and others.
 
561
  leaderboard_intro = gr.Markdown(
562
  """
563
  # 🏆 FM4SE Leaderboard: Community-Driven Evaluation of Top Foundation Models (FMs) in Software Engineering (SE) Tasks
564
+ The SWE Arena is an open-source platform designed to evaluate foundation models through human preference, fostering transparency and collaboration. This platform aims to empower the SE community to assess and compare the performance of leading FMs in related tasks. For technical details, check out our [paper](https://arxiv.org/abs/2502.01860).
565
  """,
566
  elem_classes="leaderboard-intro",
567
  )
 
590
  # Add a citation block in Markdown
591
  citation_component = gr.Markdown(
592
  """
593
+ Made with ❤️ for SWE Arena. If this work is useful to you, please consider citing:
594
  ```
595
  @inproceedings{zhao2025se,
596
+ title={SWE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering},
597
  author={Zhao, Zhimin},
598
  booktitle={ACM international conference on AI Foundation Models and Software Engineering},
599
  year={2025}}
 
604
  # Add title and description as a Markdown component
605
  arena_intro = gr.Markdown(
606
  f"""
607
+ # ⚔️ SWE Arena: Explore and Test Top FMs with SE Tasks by Community Voting
608
 
609
  ## 📜How It Works
610
  - **Blind Comparison**: Submit a SE-related query to two anonymous FMs randomly selected from up to {len(available_models)} top models from OpenAI, Gemini, Grok, Claude, Deepseek, Qwen, Llama, Mistral, and others.