Update files with instructions for preparing data for the leaderboard
Browse files- run_instructions.txt +46 -42
run_instructions.txt
CHANGED
@@ -1,42 +1,46 @@
|
|
1 |
-
Model Evaluation and Leaderboard
|
2 |
-
|
3 |
-
1) Model Evaluation
|
4 |
-
Before integrating a model into the leaderboard, it must first be evaluated using the lm-eval-harness library in both zero-shot and 5-shot configurations.
|
5 |
-
|
6 |
-
This can be done with the following command:
|
7 |
-
|
8 |
-
lm_eval --model hf --model_args pretrained=google/gemma-3-12b-it \
|
9 |
-
--tasks evalita-mp --device cuda:0 --batch_size 1 --trust_remote_code \
|
10 |
-
--output_path model_output --num_fewshot 5 --
|
11 |
-
|
12 |
-
The output generated by the library will include the model's accuracy scores on the benchmark tasks.
|
13 |
-
This output is written to the standard output and should be saved in a txt file (e.g., slurm-8368.out), which needs to be placed in the
|
14 |
-
evalita_llm_models_output directory for further processing.
|
15 |
-
|
16 |
-
2) Extracting Model Metadata
|
17 |
-
To display model details on the leaderboard (e.g., organization/group, model name, and parameter count), metadata must be retrieved from Hugging Face.
|
18 |
-
|
19 |
-
This can be done by running:
|
20 |
-
|
21 |
-
python get_model_info.py
|
22 |
-
|
23 |
-
This script processes the evaluation files from Step 1 and saves each model's metadata in a JSON file within the evalita_llm_requests directory.
|
24 |
-
|
25 |
-
3) Generating Leaderboard Submission File
|
26 |
-
The leaderboard requires a structured file containing each model’s metadata along with its benchmark accuracy scores.
|
27 |
-
|
28 |
-
To generate this file, run:
|
29 |
-
|
30 |
-
python preprocess_model_output.
|
31 |
-
|
32 |
-
This script combines the accuracy results from Step 1 with the metadata from Step 2 and outputs a JSON file in the evalita_llm_results directory.
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
|
|
|
|
|
|
|
|
|
1 |
+
Model Evaluation and Leaderboard
|
2 |
+
|
3 |
+
1) Model Evaluation
|
4 |
+
Before integrating a model into the leaderboard, it must first be evaluated using the lm-eval-harness library in both zero-shot and 5-shot configurations.
|
5 |
+
|
6 |
+
This can be done with the following command:
|
7 |
+
|
8 |
+
lm_eval --model hf --model_args pretrained=google/gemma-3-12b-it \
|
9 |
+
--tasks evalita-mp --device cuda:0 --batch_size 1 --trust_remote_code \
|
10 |
+
--output_path model_output --num_fewshot 5 --
|
11 |
+
|
12 |
+
The output generated by the library will include the model's accuracy scores on the benchmark tasks.
|
13 |
+
This output is written to the standard output and should be saved in a txt file (e.g., slurm-8368.out), which needs to be placed in the
|
14 |
+
evalita_llm_models_output LOCAL directory for further processing. Examples of such files can be found in: https://huggingface.co/datasets/evalitahf/evalita_llm_models_output/
|
15 |
+
|
16 |
+
2) Extracting Model Metadata
|
17 |
+
To display model details on the leaderboard (e.g., organization/group, model name, and parameter count), metadata must be retrieved from Hugging Face.
|
18 |
+
|
19 |
+
This can be done by running:
|
20 |
+
|
21 |
+
python get_model_info.py
|
22 |
+
|
23 |
+
This script processes the evaluation files from Step 1 and saves each model's metadata in a JSON file within the evalita_llm_requests LOCAL directory.
|
24 |
+
|
25 |
+
3) Generating Leaderboard Submission File
|
26 |
+
The leaderboard requires a structured file containing each model’s metadata along with its benchmark accuracy scores.
|
27 |
+
|
28 |
+
To generate this file, run:
|
29 |
+
|
30 |
+
python preprocess_model_output.py
|
31 |
+
|
32 |
+
This script combines the accuracy results from Step 1 with the metadata from Step 2 and outputs a JSON file for each kind of model in the evalita_llm_results LOCAL directory.
|
33 |
+
Examples of these files are in https://huggingface.co/datasets/evalitahf/evalita_llm_results
|
34 |
+
|
35 |
+
4) Updating the Hugging Face Repository
|
36 |
+
A commit and push of the following three directories from the local disk to HuggingFace is required, in order to update the evalita_llm_results repository with the newly generated files from Step 3:
|
37 |
+
evalita_llm_models_output, evalita_llm_requests and evalita_llm_results
|
38 |
+
|
39 |
+
5) Running the Leaderboard Application
|
40 |
+
To test the leaderboard locally, run the following command in your terminal and open your browser at the indicated address:
|
41 |
+
|
42 |
+
python app.py
|
43 |
+
|
44 |
+
On Hugging Face, the leaderboard can be started or stopped directly from the graphical interface, so running this command is only necessary when working locally.
|
45 |
+
|
46 |
+
|