Benchmarks - a hppdqdq Collection

hppdqdq 's Collections

Benchmarks

updated Jan 13

Running on CPU Upgrade

205

205

MMLU-Pro Leaderboard

🥇

More advanced and challenging multi-task evaluation
Running

44

44

Stick To Your Role! Leaderboard

🎭

Benchmarking LLMs on the stability of simulated populations
Running

51

51

ZeroEval Leaderboard

📊

Embed and use ZeroEval for evaluation tasks
Running

25

25

Decentralized Arena Leaderboard

🥇

Display model leaderboard evaluations
Running on CPU Upgrade

389

389

Open Medical-LLM Leaderboard

🥇

Browse and submit LLM evaluations
Running

219

219

GPU Poor LLM Arena

🏆

Compact LLM Battle Arena: Frugal AI Face-Off!
Running

109

109

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark
Running on CPU Upgrade

13.1k

13.1k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running on Zero

377

377

TTS Spaces Arena

🤗

Blind vote on HF TTS models!