evalitahf commited on
Commit
44e4725
·
verified ·
1 Parent(s): d4cf66e

Update files with instructions for preparing data for the leaderboard

Browse files
Files changed (1) hide show
  1. run_instructions.txt +46 -42
run_instructions.txt CHANGED
@@ -1,42 +1,46 @@
1
- Model Evaluation and Leaderboard
2
-
3
- 1) Model Evaluation
4
- Before integrating a model into the leaderboard, it must first be evaluated using the lm-eval-harness library in both zero-shot and 5-shot configurations.
5
-
6
- This can be done with the following command:
7
-
8
- lm_eval --model hf --model_args pretrained=google/gemma-3-12b-it \
9
- --tasks evalita-mp --device cuda:0 --batch_size 1 --trust_remote_code \
10
- --output_path model_output --num_fewshot 5 --
11
-
12
- The output generated by the library will include the model's accuracy scores on the benchmark tasks.
13
- This output is written to the standard output and should be saved in a txt file (e.g., slurm-8368.out), which needs to be placed in the
14
- evalita_llm_models_output directory for further processing.
15
-
16
- 2) Extracting Model Metadata
17
- To display model details on the leaderboard (e.g., organization/group, model name, and parameter count), metadata must be retrieved from Hugging Face.
18
-
19
- This can be done by running:
20
-
21
- python get_model_info.py
22
-
23
- This script processes the evaluation files from Step 1 and saves each model's metadata in a JSON file within the evalita_llm_requests directory.
24
-
25
- 3) Generating Leaderboard Submission File
26
- The leaderboard requires a structured file containing each model’s metadata along with its benchmark accuracy scores.
27
-
28
- To generate this file, run:
29
-
30
- python preprocess_model_output.
31
-
32
- This script combines the accuracy results from Step 1 with the metadata from Step 2 and outputs a JSON file in the evalita_llm_results directory.
33
-
34
- 4) Updating the Hugging Face Repository
35
- The evalita_llm_results repository on HuggingFace must be updated with the newly generated files from Step 3.
36
-
37
- 5) Running the Leaderboard Application
38
- Finally, execute the leaderboard application by running:
39
-
40
- python app.py
41
-
42
-
 
 
 
 
 
1
+ Model Evaluation and Leaderboard
2
+
3
+ 1) Model Evaluation
4
+ Before integrating a model into the leaderboard, it must first be evaluated using the lm-eval-harness library in both zero-shot and 5-shot configurations.
5
+
6
+ This can be done with the following command:
7
+
8
+ lm_eval --model hf --model_args pretrained=google/gemma-3-12b-it \
9
+ --tasks evalita-mp --device cuda:0 --batch_size 1 --trust_remote_code \
10
+ --output_path model_output --num_fewshot 5 --
11
+
12
+ The output generated by the library will include the model's accuracy scores on the benchmark tasks.
13
+ This output is written to the standard output and should be saved in a txt file (e.g., slurm-8368.out), which needs to be placed in the
14
+ evalita_llm_models_output LOCAL directory for further processing. Examples of such files can be found in: https://huggingface.co/datasets/evalitahf/evalita_llm_models_output/
15
+
16
+ 2) Extracting Model Metadata
17
+ To display model details on the leaderboard (e.g., organization/group, model name, and parameter count), metadata must be retrieved from Hugging Face.
18
+
19
+ This can be done by running:
20
+
21
+ python get_model_info.py
22
+
23
+ This script processes the evaluation files from Step 1 and saves each model's metadata in a JSON file within the evalita_llm_requests LOCAL directory.
24
+
25
+ 3) Generating Leaderboard Submission File
26
+ The leaderboard requires a structured file containing each model’s metadata along with its benchmark accuracy scores.
27
+
28
+ To generate this file, run:
29
+
30
+ python preprocess_model_output.py
31
+
32
+ This script combines the accuracy results from Step 1 with the metadata from Step 2 and outputs a JSON file for each kind of model in the evalita_llm_results LOCAL directory.
33
+ Examples of these files are in https://huggingface.co/datasets/evalitahf/evalita_llm_results
34
+
35
+ 4) Updating the Hugging Face Repository
36
+ A commit and push of the following three directories from the local disk to HuggingFace is required, in order to update the evalita_llm_results repository with the newly generated files from Step 3:
37
+ evalita_llm_models_output, evalita_llm_requests and evalita_llm_results
38
+
39
+ 5) Running the Leaderboard Application
40
+ To test the leaderboard locally, run the following command in your terminal and open your browser at the indicated address:
41
+
42
+ python app.py
43
+
44
+ On Hugging Face, the leaderboard can be started or stopped directly from the graphical interface, so running this command is only necessary when working locally.
45
+
46
+