alrope commited on
Commit
08706da
·
verified ·
1 Parent(s): 472397d

Update src/md.py

Browse files
Files changed (1) hide show
  1. src/md.py +2 -2
src/md.py CHANGED
@@ -4,7 +4,7 @@ import pytz
4
  ABOUT_TEXT = """
5
  ## Overview
6
  HREF is evaluation benchmark that evaluates language models' capacity of following human instructions. It is consisted of 4,258 instructions covering 11 distinct categories, including Brainstorm ,Open QA ,Closed QA ,Extract ,Generation ,Rewrite ,Summarize ,Coding ,Classify ,Fact Checking or Attributed QA ,Multi-Document Synthesis , and Reasoning Over Numerical Data.
7
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64dff1ddb5cc372803af964d/dSv3U11h936t_q-aiqbkV.png)
8
 
9
  ## Generation Configuration
10
  For reproductability, we use greedy decoding for all model generation as default. We apply chat templates to the instructions if they are implemented in model's tokenizer or explicity recommanded by the model's creators. Please contact us if you would like to change this default configuration.
@@ -30,6 +30,6 @@ pacific_tz = pytz.timezone('America/Los_Angeles')
30
  current_time = datetime.now(pacific_tz).strftime("%H:%M %Z, %d %b %Y")
31
 
32
  TOP_TEXT = f"""# HREF: Human Reference Guided Evaluation for Instructiong Following
33
- [Code]() | [Validation Set]() | [Human Agreement Set]() | [Results]() | [Paper]() | Total models: {{}} | Last restart (PST): {current_time}
34
  """
35
 
 
4
  ABOUT_TEXT = """
5
  ## Overview
6
  HREF is evaluation benchmark that evaluates language models' capacity of following human instructions. It is consisted of 4,258 instructions covering 11 distinct categories, including Brainstorm ,Open QA ,Closed QA ,Extract ,Generation ,Rewrite ,Summarize ,Coding ,Classify ,Fact Checking or Attributed QA ,Multi-Document Synthesis , and Reasoning Over Numerical Data.
7
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64dff1ddb5cc372803af964d/0TK6xku0gdJPDs_nfwzns.png)
8
 
9
  ## Generation Configuration
10
  For reproductability, we use greedy decoding for all model generation as default. We apply chat templates to the instructions if they are implemented in model's tokenizer or explicity recommanded by the model's creators. Please contact us if you would like to change this default configuration.
 
30
  current_time = datetime.now(pacific_tz).strftime("%H:%M %Z, %d %b %Y")
31
 
32
  TOP_TEXT = f"""# HREF: Human Reference Guided Evaluation for Instructiong Following
33
+ [Code](https://github.com/allenai/href) | [Validation Set](https://huggingface.co/datasets/allenai/href) | [Human Agreement Set](https://huggingface.co/datasets/allenai/href_preference) | [Results](https://huggingface.co/datasets/allenai/href_results) | [Paper](https://arxiv.org/abs/2412.15524) | Total models: {{}} | Last restart (PST): {current_time}
34
  """
35