ekurtic commited on
Commit
3c4f726
·
verified ·
1 Parent(s): 98342fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -15
README.md CHANGED
@@ -159,21 +159,7 @@ All evaluations are obtained through [lm-evaluation-harness](https://github.com/
159
  | GPQA<br>0-shot | 102.6 | 31.88 | 32.72 |
160
  | MuSR<br>0-shot | 101.2 | 42.20 | 42.72 |
161
  | MMLU-Pro<br>5-shot | 99.12 | 55.70 | 55.21 |
162
- | **OpenLLM v2<br>Average Score** | **100.48** | **56.60** | **56.87** |
163
- | RULER<br>seqlen = 131072<br>niah_multikey_1 | ? | 88.20 | ? |
164
- | RULER<br>seqlen = 131072<br>niah_multikey_2 | ? | 83.60 | ? |
165
- | RULER<br>seqlen = 131072<br>niah_multikey_3 | ? | 78.80 | ? |
166
- | RULER<br>seqlen = 131072<br>niah_multiquery | ? | 95.40 | ? |
167
- | RULER<br>seqlen = 131072<br>niah_multivalue | ? | 73.75 | ? |
168
- | RULER<br>seqlen = 131072<br>niah_single_1 | ? | 100.00 | ? |
169
- | RULER<br>seqlen = 131072<br>niah_single_2 | ? | 99.80 | ? |
170
- | RULER<br>seqlen = 131072<br>niah_single_3 | ? | 99.80 | ? |
171
- | RULER<br>seqlen = 131072<br>ruler_cwe | ? | 39.42 | ? |
172
- | RULER<br>seqlen = 131072<br>ruler_fwe | ? | 92.93 | ? |
173
- | RULER<br>seqlen = 131072<br>ruler_qa_hotpot | ? | 48.20 | ? |
174
- | RULER<br>seqlen = 131072<br>ruler_qa_squad | ? | 53.57 | ? |
175
- | RULER<br>seqlen = 131072<br>ruler_qa_vt | ? | 92.28 | ? |
176
- | **RULER<br>seqlen = 131072<br>Average Score** | **?** | **80.44** | **?** |
177
  | MMMU<br>0-shot | 101.6 | 53.44 | 54.33 |
178
  | ChartQA<br>0-shot<br>exact_match | 100.8 | 65.88 | 66.44 |
179
  | ChartQA<br>0-shot<br>relaxed_accuracy | 99.82 | 88.92 | 88.76 |
 
159
  | GPQA<br>0-shot | 102.6 | 31.88 | 32.72 |
160
  | MuSR<br>0-shot | 101.2 | 42.20 | 42.72 |
161
  | MMLU-Pro<br>5-shot | 99.12 | 55.70 | 55.21 |
162
+ | **OpenLLM v2<br>Average Score** | **100.48** | **56.60** | **56.87** | |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  | MMMU<br>0-shot | 101.6 | 53.44 | 54.33 |
164
  | ChartQA<br>0-shot<br>exact_match | 100.8 | 65.88 | 66.44 |
165
  | ChartQA<br>0-shot<br>relaxed_accuracy | 99.82 | 88.92 | 88.76 |