Iqra’Eval Shared Task

Overview

Iqra’Eval is a shared task aimed at advancing automatic assessment of Qur’anic recitation pronunciation by leveraging computational methods to detect and diagnose pronunciation errors. The focus on Qur’anic recitation provides a standardized and well-defined context for evaluating Modern Standard Arabic (MSA) pronunciation, where precise articulation is not only valued but essential for correctness according to established Tajweed rules.

Participants will develop systems capable of:

Timeline

Task Description

The Iqra’Eval shared task focuses on automatic mispronunciation detection and diagnosis in Qur’anic recitation. Given:

  1. A speech segment (an audio clip of a Qur’anic verse recitation), and
  2. A fully vowelized reference transcript (the corresponding Qur’anic text, fully diacritized),

the goal is to identify any pronunciation errors, localize them within the phoneme sequence, and classify the type of error based on Tajweed rules.

Each participant’s system must predict the sequence of phonemes that the reciter actually produced. A standardized phonemizer (Nawar Halabi’s phonetizer) will be used to generate the “gold” phoneme sequence from the reference transcript for comparison.

Key subtasks:

Example

Suppose the reference verse (fully vowelized) is:

إِنَّ اللَّهَ عَلَىٰ كُلِّ شَيْءٍ قَدِيرٌ
(inna l-lāha ʿalā kulli shay’in qadīrun)

The gold phoneme sequence (using the standard phonemizer) might be:

inna l l aa h a  ʕ a l a  k u l l i  ʃ a y ’ i n  q a d i r u n
        

If a reciter mispronounces “قَدِيرٌ” (qadīrun) as “كَدِيرٌ” (kadīrun), that corresponds to a substitution at the very start of that word: phoneme /q/ → /k/.

A well-trained system should:

  1. Flag the pronunciation of “قَدِيرٌ” as erroneous,
  2. Identify that the first phoneme in that word was substituted (“/q/” → “/k/”), and
  3. Classify it under the Tajweed error category “Ghunnah/Qaf vs. Kaf error.”
Pronunciation Assessment in Arabic

Figure: Example of a phoneme-level comparison between reference vs. predicted for an Arabic Qur’anic recitation.

Evaluation Criteria

Systems will be scored on their ability to detect and correctly classify phoneme-level errors:

A final Composite Error Score (CES) will be computed by combining:

  1. Boundary-aware detection accuracy (punish off-by-one index errors lightly),
  2. Per-error-type classification F1-score (substitution, deletion, insertion), and
  3. Overall phoneme-sequence alignment score (Levenshtein-based alignment to reward correct sequences).

(Detailed evaluation weights and scripts will be made available on June 5, 2025.)

Submission Details (Draft)

Participants are required to submit a CSV file named submission.csv containing the predicted phoneme sequences for each audio sample. The file must have exactly two columns:

Below is a minimal example illustrating the required format:

ID,Labels
0000_0001, i n n a m a a y a k h a l l a h a m i n ʕ i b a a d i h u l ʕ u l a m
0000_0002, m a a n a n s a k h u m i n i ʕ a a y a t i n
0000_0003, y u k h i k u m u n n u ʔ a u ʔ a m a n a t a n m m i n h u
…  
        

The first column (ID) should match exactly the audio filenames (without extension). The second column (Labels) is the predicted phoneme string.

Important:

Dataset Description

All data are hosted on Hugging Face. Two main splits are provided:

A sample submission file (sample_submission.csv) is also provided in the repository.

Column Definitions:

Data Splits:
• Training (train): 79 hours total
• Development (dev): 3.4 hours total

TTS Data (Optional Use)

We also provide a high-quality TTS corpus for auxiliary experiments (e.g., data augmentation, synthetic pronunciation error simulation). This TTS set can be loaded via:

Researchers who wish to experiment with “synthetic mispronunciations” can use the TTS waveform + forced-alignment pipeline to generate various kinds of pronunciation errors in a controlled manner.

Resources

For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.

Future Updates

Further details on evaluation criteria (exact scoring weights), submission templates, and any clarifications will be posted on the shared task website when test data are released (June 5, 2025). Stay tuned!