tohid.abedini commited on
Commit
6813459
·
1 Parent(s): 2c3fe6c

[Add] about

Browse files
Files changed (1) hide show
  1. utils.py +4 -4
utils.py CHANGED
@@ -109,11 +109,11 @@ body, .gradio-container, .gr-button, .gr-input, .gr-slider, .gr-dropdown, .gr-ma
109
  """
110
 
111
  LLM_BENCHMARKS_ABOUT_TEXT = f"""
112
- ## Persian LLM Evaluation Leaderboard (v1)
113
 
114
  The Persian LLM Evaluation Leaderboard, developed by **Part DP AI** in collaboration with **AUT (Amirkabir University of Technology) NLP Lab**, provides a comprehensive benchmarking system specifically designed for Persian language models. This leaderboard, based on the open-source [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), offers a unique platform for evaluating the performance of large language models (LLMs) on tasks that demand linguistic proficiency and technical skill in Persian.
115
 
116
- ## Key Features
117
 
118
  1. **Open Evaluation Access**
119
  The leaderboard allows open participation, meaning that developers and researchers working with open-source models can submit evaluation requests for their models. This accessibility encourages the development and testing of Persian LLMs within the broader AI ecosystem.
@@ -138,13 +138,13 @@ The Persian LLM Evaluation Leaderboard, developed by **Part DP AI** in collabora
138
  5. **Comprehensive Evaluation Pipeline**
139
  By integrating a standardized evaluation pipeline, models are assessed across a variety of data types, including text, mathematical formulas, and numerical data. This multi-faceted approach enhances the evaluation’s reliability and allows for precise, nuanced assessment of model performance across multiple dimensions.
140
 
141
- ## Background and Goals
142
 
143
  Recent months have seen a notable increase in the development of Persian language models by research centers and AI companies in Iran. However, the lack of reliable, standardized benchmarks for Persian models has made it challenging to evaluate model quality comprehensively. Global benchmarks typically do not support Persian, resulting in skewed or unreliable results for Persian-based AI.
144
 
145
  This leaderboard addresses this gap by providing a locally-focused, transparent system that enables consistent, fair comparisons of Persian models. It is expected to be a valuable tool for Persian-speaking businesses and developers, allowing them to select models best suited to their needs. Researchers and model developers also benefit from the competitive environment, with opportunities to showcase and improve their models based on benchmark rankings.
146
 
147
- ## Data Privacy and Integrity
148
 
149
  To maintain evaluation integrity and prevent overfitting or data leakage, only part of the benchmark dataset is openly available. This limited access approach upholds model evaluation reliability, ensuring that results are genuinely representative of each model’s capabilities across unseen data.
150
 
 
109
  """
110
 
111
  LLM_BENCHMARKS_ABOUT_TEXT = f"""
112
+ # Persian LLM Evaluation Leaderboard (v1)
113
 
114
  The Persian LLM Evaluation Leaderboard, developed by **Part DP AI** in collaboration with **AUT (Amirkabir University of Technology) NLP Lab**, provides a comprehensive benchmarking system specifically designed for Persian language models. This leaderboard, based on the open-source [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), offers a unique platform for evaluating the performance of large language models (LLMs) on tasks that demand linguistic proficiency and technical skill in Persian.
115
 
116
+ ## 1.Key Features
117
 
118
  1. **Open Evaluation Access**
119
  The leaderboard allows open participation, meaning that developers and researchers working with open-source models can submit evaluation requests for their models. This accessibility encourages the development and testing of Persian LLMs within the broader AI ecosystem.
 
138
  5. **Comprehensive Evaluation Pipeline**
139
  By integrating a standardized evaluation pipeline, models are assessed across a variety of data types, including text, mathematical formulas, and numerical data. This multi-faceted approach enhances the evaluation’s reliability and allows for precise, nuanced assessment of model performance across multiple dimensions.
140
 
141
+ ## 2.Background and Goals
142
 
143
  Recent months have seen a notable increase in the development of Persian language models by research centers and AI companies in Iran. However, the lack of reliable, standardized benchmarks for Persian models has made it challenging to evaluate model quality comprehensively. Global benchmarks typically do not support Persian, resulting in skewed or unreliable results for Persian-based AI.
144
 
145
  This leaderboard addresses this gap by providing a locally-focused, transparent system that enables consistent, fair comparisons of Persian models. It is expected to be a valuable tool for Persian-speaking businesses and developers, allowing them to select models best suited to their needs. Researchers and model developers also benefit from the competitive environment, with opportunities to showcase and improve their models based on benchmark rankings.
146
 
147
+ ## 3.Data Privacy and Integrity
148
 
149
  To maintain evaluation integrity and prevent overfitting or data leakage, only part of the benchmark dataset is openly available. This limited access approach upholds model evaluation reliability, ensuring that results are genuinely representative of each model’s capabilities across unseen data.
150