Iqra’Eval Shared Task

Overview

Iqra'Eval is a shared task aimed at advancing automatic assessment of Qur’anic recitation pronunciation by leveraging computational methods to detect and diagnose pronunciation errors. The focus on Qur’anic recitation provides a standardized and well-defined context for evaluating Modern Standard Arabic (MSA) pronunciation.

Participants will develop systems capable of detecting mispronunciations (e.g., substitution, deletion, or insertion of phonemes).

Timeline

June 1, 2025: Official announcement
June 10, 2025: Release of training data, dev set, phonetizer, baselines
June 20, 2025: Opening Leaderboard
July 20, 2025: Registration deadline
July 24, 2025: QuranMB test data release
July 29, 2025: Test set submission closes
July 30, 2025: Final results released
August 15, 2025: System description papers due
August 22, 2025: Notification of acceptance
September 5, 2025: Camera-ready versions due

Task Description: Quranic Mispronunciation Detection System

Design a model to detect and provide detailed feedback on mispronunciations in Quranic recitations. Users read vowelized verses; the model predicts the spoken phoneme sequence and flags deviations. Evaluation is on the QuranMB.v2 dataset with human‐annotated errors.

Figure: Overview of the Mispronunciation Detection Workflow

1. Read the Verse

System shows a Reference Verse plus its Reference Phoneme Sequence.

Example:

Arabic: إِنَّ الصَّفَا وَالْمَرْوَةَ مِنْ شَعَائِرِ اللَّهِ
Phoneme: < i n n a SS A f aa w a l m a r w a t a m i n $ a E a a < i r i l l a h i

2. Save Recording

User recites; system captures and stores the audio waveform.

3. Mispronunciation Detection

Model predicts the phoneme sequence—deviations from reference indicate mispronunciations.

Example of Mispronunciation:

Reference: < i n n a SS A f aa w a l m a r w a t a m i n $ a E a a < i r i l l a h i
Predicted: < i n n a SS A f aa w a l m a r w a t a m i n s a E a a < i r u l l a h i
Annotated: < i n n a SS A f aa w a l m a r w s a E a a < i r u l l a h i

Here, $→s and i→u; omission of ta went undetected.

Training Dataset: Description

Hosted on Hugging Face:

Training: 79 hours of MSA speech augmented with Qur’anic recitations load_dataset("IqraEval/Iqra_train", split="train")
Development: 3.4 hours as dev set load_dataset("IqraEval/Iqra_train", split="dev")

Columns:

audio: waveform
sentence: original text (verse)
index: verse ID
tashkeel_sentence: fully diacritized text (verse)
phoneme: phoneme sequence (using phonetizer)

Training Dataset: TTS Data (Optional)

Auxiliary high-quality TTS corpus for augmentation: load_dataset("IqraEval/Iqra_TTS")

Test Dataset: QuranMB.v2

98 verses × 18 speakers ≈ 2 h, with deliberate errors and human annotations. load_dataset("IqraEval/Iqra_QuranMB_v2")

Resources & Links

Submission Details (Draft)

Submit a UTF-8 CSV named teamID_submission.csv with two columns:

ID: audio filename (no extension)
Labels: predicted phoneme sequence (space-separated)

ID,Labels
0000_0001, i n n a m a a y a …
0000_0002, m a a n a n s a …
…

Note: no extra spaces, single CSV, no archives.

Evaluation Criteria

IqraEval Leaderboard is based on phoneme-level F1-score. We use a hierarchical evaluation (detection + diagnostic) per MDD Overview.

What is said: annotated phoneme sequence
What is predicted: model output
What should have been said: reference sequence

From these we compute:

TA: correct phonemes accepted
TR: mispronunciations correctly detected
FR: correct phonemes flagged as errors
FA: mispronunciations missed

Rates:

FRR: FR/(TA+FR)
FAR: FA/(FA+TR)
DER: DE/(CD+DE)

Plus standard Precision, Recall, F1 for detection:

Precision = TR/(TR+FR)
Recall = TR/(TR+FA)
F1 = 2·P·R/(P+R)

Suggested Research Directions

Advanced Mispronunciation Detection Models
Apply state-of-the-art self-supervised models (e.g., Wav2Vec2.0, HuBERT), using variants that are pre-trained/fine-tuned on Arabic speech. These models can then be fine-tuned on Quranic recitations to improve phoneme-level accuracy.
Data Augmentation Strategies
Create synthetic mispronunciation examples using pipelines like SpeechBlender. Augmenting limited Arabic/Quranic speech data helps mitigate data scarcity and improves model robustness.
Analysis of Common Mispronunciation Patterns
Perform statistical analysis on the QuranMB dataset to identify prevalent errors (e.g., substituting similar phonemes, swapping vowels). These insights can drive targeted training and tailored feedback rules.

Registration

Teams and individual participants must register to gain access to the test set. Please complete the registration form using the link below:

Registration Form

Registration opens on June 10, 2025.

Future Updates

Further details on the open-set leaderboard submission will be posted on the shared task website (June 20, 2025). Stay tuned!

Contact and Support

For inquiries and support, reach out to the task coordinators at iqraeval@googlegroups.com.

References

El Kheir Y. et al., “SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation,” arXiv:2211.00923, 2022.
Al Harere A. & Al Jallad K., “Mispronunciation Detection of Basic Quranic Recitation Rules using Deep Learning,” arXiv:2305.06429, 2023.
Aly S. A. et al., “ASMDD: Arabic Speech Mispronunciation Detection Dataset,” arXiv:2111.01136, 2021.
Moustafa A. & Aly S. A., “Efficient Voice Identification Using Wav2Vec2.0 and HuBERT…,” arXiv:2111.06331, 2021.
El Kheir Y. et al., “Automatic Pronunciation Assessment – A Review,” arXiv:2310.13974, 2021.