Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
hppdqdq 's Collections
Benchmarks

Benchmarks

updated Jan 13
Upvote
-

  • Running on CPU Upgrade
    205
    205

    MMLU-Pro Leaderboard

    🥇

    More advanced and challenging multi-task evaluation


  • Running
    44
    44

    Stick To Your Role! Leaderboard

    🎭

    Benchmarking LLMs on the stability of simulated populations


  • Running
    51
    51

    ZeroEval Leaderboard

    📊

    Embed and use ZeroEval for evaluation tasks


  • Running
    25
    25

    Decentralized Arena Leaderboard

    🥇

    Display model leaderboard evaluations


  • Running on CPU Upgrade
    389
    389

    Open Medical-LLM Leaderboard

    🥇

    Browse and submit LLM evaluations


  • Running
    219
    219

    GPU Poor LLM Arena

    🏆

    Compact LLM Battle Arena: Frugal AI Face-Off!


  • Running
    109
    109

    Open VLM Video Leaderboard

    🌎

    VLMEvalKit Eval Results in video understanding benchmark


  • Running on CPU Upgrade
    13.1k
    13.1k

    Open LLM Leaderboard

    🏆

    Track, rank and evaluate open LLMs and chatbots


  • Running on Zero
    377
    377

    TTS Spaces Arena

    🤗

    Blind vote on HF TTS models!

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs