OpsEval / data_v2 /inspur_zh_mc_gen.csv
Junetheriver's picture
update leaderboard 2024-09-06
fe35dbb
raw
history blame
906 Bytes
name,zero_naive,zero_self_con,zero_cot,zero_cot_self_con,few_naive,few_self_con,few_cot,few_cot_self_con
Gpt-4,,,87.07482993197279,87.07482993197279,87.07482993197279,87.07482993197279,91.15646258503402,91.15646258503402
GPT-4o,87.07482993197279,87.07482993197279,89.1156462585034,89.1156462585034,89.1156462585034,89.1156462585034,91.15646258503402,91.15646258503402
Baichuan2-7B-Chat,62.585034013605444,62.585034013605444,,,42.857142857142854,42.857142857142854,,
Claude-3-Opus,83.6734693877551,83.6734693877551,85.03401360544217,85.03401360544217,87.75510204081633,87.75510204081633,91.83673469387756,91.83673469387756
Qwen2-0.5B-Instruct,56.4625850340136,56.4625850340136,,,,,57.14285714285714,57.14285714285714
Qwen2-1.5B-Instruct,,,68.02721088435374,68.02721088435374,,,,
Qwen2-7B-Instruct,76.19047619047619,76.19047619047619,80.95238095238095,80.95238095238095,76.87074829931973,76.87074829931973,,