Iqra’Eval Shared Task

Overview

Iqra’Eval is a shared task aimed at advancing automatic assessment of Qur’anic recitation pronunciation by leveraging computational methods to detect and diagnose pronunciation errors. The focus on Qur’anic recitation provides a standardized and well-defined context for evaluating Modern Standard Arabic (MSA) pronunciation, where precise articulation is not only valued but essential for correctness according to established Tajweed rules.

Participants will develop systems capable of:

Detecting whether a segment of Qur’anic recitation contains pronunciation errors.
Diagnosing the nature of the error (e.g., substitution, deletion, or insertion of phonemes).

Timeline

June 1, 2025: Official announcement of the shared task
June 5, 2025: Release of training data, development set (QuranMB), phonetizer script, and baseline systems
July 24, 2025: Registration deadline and release of test data
July 27, 2025: End of evaluation cycle (test set submission closes)
July 30, 2025: Final results released
August 15, 2025: System description paper submissions due
August 22, 2025: Notification of acceptance
September 5, 2025: Camera-ready versions due

Task Description

The Iqra’Eval shared task focuses on automatic mispronunciation detection and diagnosis in Qur’anic recitation. Given:

A speech segment (an audio clip of a Qur’anic verse recitation), and
A fully vowelized reference transcript (the corresponding Qur’anic text, fully diacritized),

the goal is to identify any pronunciation errors, localize them within the phoneme sequence, and classify the type of error based on Tajweed rules.

Each participant’s system must predict the sequence of phonemes that the reciter actually produced. A standardized phonemizer (Nawar Halabi’s phonetizer) will be used to generate the “gold” phoneme sequence from the reference transcript for comparison.

Key subtasks:

Compare predicted phoneme sequence vs. gold reference.
Detect substitutions (e.g., pronouncing /q/ as /k/), deletions (e.g., dropping a hamza), or insertions (e.g., adding an extra vowel) of phonemes.
Localize the error to a specific phoneme index in the utterance.
Classify what type of mistake occurred based on Tajweed (e.g., madd errors, ikhfa, idgham, etc.).

Example

Suppose the reference verse (fully vowelized) is:

إِنَّ اللَّهَ عَلَىٰ كُلِّ شَيْءٍ قَدِيرٌ
(inna l-lāha ʿalā kulli shay’in qadīrun)

The gold phoneme sequence (using the standard phonemizer) might be:

inna l l aa h a  ʕ a l a  k u l l i  ʃ a y ’ i n  q a d i r u n

If a reciter mispronounces “قَدِيرٌ” (qadīrun) as “كَدِيرٌ” (kadīrun), that corresponds to a substitution at the very start of that word: phoneme /q/ → /k/.

A well-trained system should:

Flag the pronunciation of “قَدِيرٌ” as erroneous,
Identify that the first phoneme in that word was substituted (“/q/” → “/k/”), and
Classify it under the Tajweed error category “Ghunnah/Qaf vs. Kaf error.”

Figure: Example of a phoneme-level comparison between reference vs. predicted for an Arabic Qur’anic recitation.

Evaluation Criteria

Systems will be scored on their ability to detect and correctly classify phoneme-level errors:

Detection accuracy: Did the system spot that a phoneme-level error occurred in the segment?
Localization precision: Did the system mark the correct positions (indices) in the phoneme sequence where the error(s) occurred?
Classification F1-score: Given that an error is detected at a particular position, did the system assign the correct error type (e.g., substitution vs. insertion vs. deletion, plus the specific Tajweed subcategory)?

A final Composite Error Score (CES) will be computed by combining:

Boundary-aware detection accuracy (punish off-by-one index errors lightly),
Per-error-type classification F1-score (substitution, deletion, insertion), and
Overall phoneme-sequence alignment score (Levenshtein-based alignment to reward correct sequences).

(Detailed evaluation weights and scripts will be made available on June 5, 2025.)

Submission Details (Draft)

Participants are required to submit a CSV file named submission.csv containing the predicted phoneme sequences for each audio sample. The file must have exactly two columns:

ID: Unique identifier of the audio sample.
Labels: The predicted phoneme sequence, with each phoneme separated by a single space.

Below is a minimal example illustrating the required format:

ID,Labels
0000_0001, i n n a m a a y a k h a l l a h a m i n ʕ i b a a d i h u l ʕ u l a m
0000_0002, m a a n a n s a k h u m i n i ʕ a a y a t i n
0000_0003, y u k h i k u m u n n u ʔ a u ʔ a m a n a t a n m m i n h u
…

The first column (ID) should match exactly the audio filenames (without extension). The second column (Labels) is the predicted phoneme string.

Important:

Use UTF-8 encoding.
Do not include extra spaces at the start or end of each line.
Submit a single CSV file (no archives). Filename must be submission.csv.

Dataset Description

All data are hosted on Hugging Face. Two main splits are provided:

Training set: 79 hours of Modern Standard Arabic (MSA) speech, augmented with multiple Qur’anic recitations.
df = load_dataset("mostafaashahin/IqraEval_Training_Data", split="train")
Development set (QuranMB): 3.4 hours reserved for tuning and validation.
df = load_dataset("mostafaashahin/IqraEval_Training_Data", split="dev")

A sample submission file (sample_submission.csv) is also provided in the repository.

Column Definitions:

sentence: Original sentence text (may be partially diacritized or non-diacritized).
q_index: If from the Quran, the verse index (0–6265, including Basmalah); otherwise -1.
start_word_index, end_word_index: Word positions within the verse (or -1 if non-Quranic).
tashkeel_sentence: Fully diacritized sentence (auto-generated via a diacritization tool).
phoneme: Phoneme sequence corresponding to the diacritized sentence (Nawar Halabi phonetizer).

Data Splits:
• Training (train): 79 hours total
• Development (dev): 3.4 hours total

TTS Data (Optional Use)

We also provide a high-quality TTS corpus for auxiliary experiments (e.g., data augmentation, synthetic pronunciation error simulation). This TTS set can be loaded via:

df_tts = load_dataset("IqraEval/Iqra_TTS", split="train")

Researchers who wish to experiment with “synthetic mispronunciations” can use the TTS waveform + forced-alignment pipeline to generate various kinds of pronunciation errors in a controlled manner.

Resources

For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.

Future Updates

Further details on evaluation criteria (exact scoring weights), submission templates, and any clarifications will be posted on the shared task website when test data are released (June 5, 2025). Stay tuned!