Update src/md.py
Browse files
src/md.py
CHANGED
@@ -4,7 +4,7 @@ import pytz
|
|
4 |
ABOUT_TEXT = """
|
5 |
## Overview
|
6 |
HREF is evaluation benchmark that evaluates language models' capacity of following human instructions. It is consisted of 4,258 instructions covering 11 distinct categories, including Brainstorm ,Open QA ,Closed QA ,Extract ,Generation ,Rewrite ,Summarize ,Coding ,Classify ,Fact Checking or Attributed QA ,Multi-Document Synthesis , and Reasoning Over Numerical Data.
|
7 |
-

|
|
30 |
current_time = datetime.now(pacific_tz).strftime("%H:%M %Z, %d %b %Y")
|
31 |
|
32 |
TOP_TEXT = f"""# HREF: Human Reference Guided Evaluation for Instructiong Following
|
33 |
-
[Code]() | [Validation Set]() | [Human Agreement Set]() | [Results]() | [Paper]() | Total models: {{}} | Last restart (PST): {current_time}
|
34 |
"""
|
35 |
|
|
|
4 |
ABOUT_TEXT = """
|
5 |
## Overview
|
6 |
HREF is evaluation benchmark that evaluates language models' capacity of following human instructions. It is consisted of 4,258 instructions covering 11 distinct categories, including Brainstorm ,Open QA ,Closed QA ,Extract ,Generation ,Rewrite ,Summarize ,Coding ,Classify ,Fact Checking or Attributed QA ,Multi-Document Synthesis , and Reasoning Over Numerical Data.
|
7 |
+

|
8 |
|
9 |
## Generation Configuration
|
10 |
For reproductability, we use greedy decoding for all model generation as default. We apply chat templates to the instructions if they are implemented in model's tokenizer or explicity recommanded by the model's creators. Please contact us if you would like to change this default configuration.
|
|
|
30 |
current_time = datetime.now(pacific_tz).strftime("%H:%M %Z, %d %b %Y")
|
31 |
|
32 |
TOP_TEXT = f"""# HREF: Human Reference Guided Evaluation for Instructiong Following
|
33 |
+
[Code](https://github.com/allenai/href) | [Validation Set](https://huggingface.co/datasets/allenai/href) | [Human Agreement Set](https://huggingface.co/datasets/allenai/href_preference) | [Results](https://huggingface.co/datasets/allenai/href_results) | [Paper](https://arxiv.org/abs/2412.15524) | Total models: {{}} | Last restart (PST): {current_time}
|
34 |
"""
|
35 |
|