OpsEval / data_v2 /pufa_zh_mc_gen.csv
Junetheriver's picture
update leaderboard 2024-09-06
fe35dbb
raw
history blame
936 Bytes
name,zero_self_con,zero_cot_self_con,few_self_con,few_cot_self_con
Baichuan2-13B-Chat,66.67,66.67,61.33,62.67
Chatglm3-6B,60.0,61.33333333,56.0,58.66666667
Devops-Model-14B-Chat,29.33,61.33,81.33,70.67
Ernie-Bot-4.0,86.67,86.67,82.67,86.67
Gpt-3.5-Turbo,77.33,81.33,78.67,82.67
GPT-4,88.0,86.67,84.0,90.67
Internlm2-Chat-20B,76.0,80.0,80.0,
Internlm2-Chat-7B,78.66666667,72.0,72.0,53.33333333
Llama-2-13B,44.0,68.0,61.33,53.33
Llama-2-70B-Chat,6.67,65.33,49.33,66.67
Llama-2-7B,25.33,40.0,48.0,52.0
Mistral-7B,4.0,58.67,22.67,54.67
Qwen-14B-Chat,73.33,72.0,73.33,80.0
Qwen-72B-Chat,90.67,85.33,88.0,82.67
Yi-34B-Chat,84.0,88.0,92.0,89.33
Claude-3-Opus,93.24324324324324,,,
gemma_2b,36.0,41.33333,36.0,30.66667
gemma_7b,34.66667,56.0,46.66667,56.0
Meta-Llama-3-8B-Instruct,85.8108108108108,31.756756756756754,83.1081081081081,27.7027027027027
Qwen1.5-14B-Base,78.66667,72.0,92.0,42.66667
Qwen1.5-14B-Chat,89.33333,85.33333,80.0,85.33333