Update README.md
Browse files
README.md
CHANGED
@@ -159,21 +159,7 @@ All evaluations are obtained through [lm-evaluation-harness](https://github.com/
|
|
159 |
| GPQA<br>0-shot | 102.6 | 31.88 | 32.72 |
|
160 |
| MuSR<br>0-shot | 101.2 | 42.20 | 42.72 |
|
161 |
| MMLU-Pro<br>5-shot | 99.12 | 55.70 | 55.21 |
|
162 |
-
| **OpenLLM v2<br>Average Score** | **100.48** | **56.60** | **56.87** |
|
163 |
-
| RULER<br>seqlen = 131072<br>niah_multikey_1 | ? | 88.20 | ? |
|
164 |
-
| RULER<br>seqlen = 131072<br>niah_multikey_2 | ? | 83.60 | ? |
|
165 |
-
| RULER<br>seqlen = 131072<br>niah_multikey_3 | ? | 78.80 | ? |
|
166 |
-
| RULER<br>seqlen = 131072<br>niah_multiquery | ? | 95.40 | ? |
|
167 |
-
| RULER<br>seqlen = 131072<br>niah_multivalue | ? | 73.75 | ? |
|
168 |
-
| RULER<br>seqlen = 131072<br>niah_single_1 | ? | 100.00 | ? |
|
169 |
-
| RULER<br>seqlen = 131072<br>niah_single_2 | ? | 99.80 | ? |
|
170 |
-
| RULER<br>seqlen = 131072<br>niah_single_3 | ? | 99.80 | ? |
|
171 |
-
| RULER<br>seqlen = 131072<br>ruler_cwe | ? | 39.42 | ? |
|
172 |
-
| RULER<br>seqlen = 131072<br>ruler_fwe | ? | 92.93 | ? |
|
173 |
-
| RULER<br>seqlen = 131072<br>ruler_qa_hotpot | ? | 48.20 | ? |
|
174 |
-
| RULER<br>seqlen = 131072<br>ruler_qa_squad | ? | 53.57 | ? |
|
175 |
-
| RULER<br>seqlen = 131072<br>ruler_qa_vt | ? | 92.28 | ? |
|
176 |
-
| **RULER<br>seqlen = 131072<br>Average Score** | **?** | **80.44** | **?** |
|
177 |
| MMMU<br>0-shot | 101.6 | 53.44 | 54.33 |
|
178 |
| ChartQA<br>0-shot<br>exact_match | 100.8 | 65.88 | 66.44 |
|
179 |
| ChartQA<br>0-shot<br>relaxed_accuracy | 99.82 | 88.92 | 88.76 |
|
|
|
159 |
| GPQA<br>0-shot | 102.6 | 31.88 | 32.72 |
|
160 |
| MuSR<br>0-shot | 101.2 | 42.20 | 42.72 |
|
161 |
| MMLU-Pro<br>5-shot | 99.12 | 55.70 | 55.21 |
|
162 |
+
| **OpenLLM v2<br>Average Score** | **100.48** | **56.60** | **56.87** | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
163 |
| MMMU<br>0-shot | 101.6 | 53.44 | 54.33 |
|
164 |
| ChartQA<br>0-shot<br>exact_match | 100.8 | 65.88 | 66.44 |
|
165 |
| ChartQA<br>0-shot<br>relaxed_accuracy | 99.82 | 88.92 | 88.76 |
|