Spaces:

evalitahf
/

evalita_llm_leaderboard

Running

App Files Files Community

evalitahf commited on May 20

Commit

44e4725

verified ·

1 Parent(s): d4cf66e

Update files with instructions for preparing data for the leaderboard

Browse files

Files changed (1) hide show

run_instructions.txt +46 -42

run_instructions.txt CHANGED Viewed

@@ -1,42 +1,46 @@
-Model Evaluation and Leaderboard
-1) Model Evaluation
-Before integrating a model into the leaderboard, it must first be evaluated using the lm-eval-harness library in both zero-shot and 5-shot configurations.
-This can be done with the following command:
-lm_eval --model hf --model_args pretrained=google/gemma-3-12b-it \
-  --tasks evalita-mp --device cuda:0 --batch_size 1 --trust_remote_code \
-  --output_path model_output --num_fewshot 5 --
-The output generated by the library will include the model's accuracy scores on the benchmark tasks.
-This output is written to the standard output and should be saved in a txt file (e.g., slurm-8368.out), which needs to be placed in the
- evalita_llm_models_output directory for further processing.
-2) Extracting Model Metadata
-To display model details on the leaderboard (e.g., organization/group, model name, and parameter count), metadata must be retrieved from Hugging Face.
-This can be done by running:
-python get_model_info.py
-This script processes the evaluation files from Step 1 and saves each model's metadata in a JSON file within the evalita_llm_requests directory.
-3) Generating Leaderboard Submission File
-The leaderboard requires a structured file containing each model’s metadata along with its benchmark accuracy scores.
-To generate this file, run:
-python preprocess_model_output.
-This script combines the accuracy results from Step 1 with the metadata from Step 2 and outputs a JSON file in the evalita_llm_results directory.
-4) Updating the Hugging Face Repository
-The evalita_llm_results repository on HuggingFace must be updated with the newly generated files from Step 3.
-5) Running the Leaderboard Application
-Finally, execute the leaderboard application by running:
-python app.py

+Model Evaluation and Leaderboard
+1) Model Evaluation
+Before integrating a model into the leaderboard, it must first be evaluated using the lm-eval-harness library in both zero-shot and 5-shot configurations.
+This can be done with the following command:
+lm_eval --model hf --model_args pretrained=google/gemma-3-12b-it \
+  --tasks evalita-mp --device cuda:0 --batch_size 1 --trust_remote_code \
+  --output_path model_output --num_fewshot 5 --
+The output generated by the library will include the model's accuracy scores on the benchmark tasks.
+This output is written to the standard output and should be saved in a txt file (e.g., slurm-8368.out), which needs to be placed in the
+ evalita_llm_models_output LOCAL directory for further processing. Examples of such files can be found in: https://huggingface.co/datasets/evalitahf/evalita_llm_models_output/
+2) Extracting Model Metadata
+To display model details on the leaderboard (e.g., organization/group, model name, and parameter count), metadata must be retrieved from Hugging Face.
+This can be done by running:
+python get_model_info.py
+This script processes the evaluation files from Step 1 and saves each model's metadata in a JSON file within the evalita_llm_requests LOCAL directory.
+3) Generating Leaderboard Submission File
+The leaderboard requires a structured file containing each model’s metadata along with its benchmark accuracy scores.
+To generate this file, run:
+python preprocess_model_output.py
+This script combines the accuracy results from Step 1 with the metadata from Step 2 and outputs a JSON file for each kind of model in the evalita_llm_results LOCAL directory.
+Examples of these files are in https://huggingface.co/datasets/evalitahf/evalita_llm_results
+4) Updating the Hugging Face Repository
+A commit and push of the following three directories from the local disk to HuggingFace is required, in order to update the evalita_llm_results repository with the newly generated files from Step 3:
+evalita_llm_models_output, evalita_llm_requests and evalita_llm_results
+5) Running the Leaderboard Application
+To test the leaderboard locally, run the following command in your terminal and open your browser at the indicated address:
+python app.py
+On Hugging Face, the leaderboard can be started or stopped directly from the graphical interface, so running this command is only necessary when working locally.