Spaces:
Sleeping
Sleeping
title: MTEB Human Evaluation Demo | |
emoji: π | |
colorFrom: blue | |
colorTo: indigo | |
sdk: gradio | |
sdk_version: 3.42.0 | |
app_file: app.py | |
pinned: false | |
# MTEB Human Evaluation Demo | |
This is a demo of the human evaluation interface for the MTEB (Massive Text Embedding Benchmark) project. It allows annotators to evaluate the relevance of documents for reranking tasks. | |
## How to use | |
1. Navigate to the "Demo" tab to try the interface with an example dataset (AskUbuntuDupQuestions) | |
2. Read the query at the top | |
3. For each document, assign a rank using the dropdown (1 = most relevant) | |
4. Submit your rankings | |
5. Navigate between samples using the Previous/Next buttons | |
6. Your annotations are saved automatically | |
## About MTEB Human Evaluation | |
This project aims to establish human performance benchmarks for MTEB tasks, helping to understand the realistic "ceiling" for embedding model performance. |