---
title: Post-ASR LLM N-Best Transcription Correction
emoji: 🏢
colorFrom: yellow
colorTo: yellow
sdk: gradio
sdk_version: 5.21.0
app_file: app.py
pinned: false
license: mit
short_description: Generative Error Correction (GER) Task Baseline, WER 
---

# Post-ASR Text Correction WER Leaderboard

This application displays a baseline Word Error Rate (WER) leaderboard for the test data in the [GenSEC-LLM/SLT-Task1-Post-ASR-Text-Correction](https://huggingface.co/datasets/GenSEC-LLM/SLT-Task1-Post-ASR-Text-Correction) dataset.

## Dataset Sources

The leaderboard shows WER metrics for multiple speech recognition sources as columns:
- CHiME4
- CORAAL
- CommonVoice
- LRS2
- LibriSpeech (Clean and Other)
- SwitchBoard
- Tedlium-3
- OVERALL (aggregate across all sources)

## Metrics

The leaderboard displays as rows:
- **Count**: Number of examples in the test set for each source
- **No LM Baseline**: Word Error Rate between the reference transcription and 1-best ASR output without language model correction

## Baseline Calculation

Word Error Rate is calculated between:
- Reference transcription ("transcription" field)
- 1-best ASR output ("input1" field or first item from "hypothesis" when input1 is unavailable)

Lower WER values indicate better transcription accuracy.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

## Table Structure

The leaderboard is displayed as a table with:

- **Rows**: "Number of Examples" and "Word Error Rate (WER)"
- **Columns**: Different data sources (CHiME4, CORAAL, CommonVoice, etc.) and OVERALL

Each cell shows the corresponding metric for that specific data source. The OVERALL column shows aggregate metrics across all sources.