--- title: Post-ASR LLM N-Best Transcription Correction emoji: 🏢 colorFrom: yellow colorTo: yellow sdk: gradio sdk_version: 5.21.0 app_file: app.py pinned: false license: mit short_description: Generative Error Correction (GER) Task Baseline, WER --- # Post-ASR Text Correction WER Leaderboard This application displays a baseline Word Error Rate (WER) leaderboard for the test data in the [GenSEC-LLM/SLT-Task1-Post-ASR-Text-Correction](https://huggingface.co/datasets/GenSEC-LLM/SLT-Task1-Post-ASR-Text-Correction) dataset. ## Dataset Sources The leaderboard shows WER metrics for multiple speech recognition sources as columns: - CHiME4 - CORAAL - CommonVoice - LRS2 - LibriSpeech (Clean and Other) - SwitchBoard - Tedlium-3 - OVERALL (aggregate across all sources) ## Metrics The leaderboard displays as rows: - **Count**: Number of examples in the test set for each source - **No LM Baseline**: Word Error Rate between the reference transcription and 1-best ASR output without language model correction ## Baseline Calculation Word Error Rate is calculated between: - Reference transcription ("transcription" field) - 1-best ASR output ("input1" field or first item from "hypothesis" when input1 is unavailable) Lower WER values indicate better transcription accuracy. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference ## Table Structure The leaderboard is displayed as a table with: - **Rows**: "Number of Examples" and "Word Error Rate (WER)" - **Columns**: Different data sources (CHiME4, CORAAL, CommonVoice, etc.) and OVERALL Each cell shows the corresponding metric for that specific data source. The OVERALL column shows aggregate metrics across all sources.