Spaces:

IqraEval
/

SharedTask_ArabicNLP2025

Running

App Files Files Community

01Yassine commited on 23 days ago

Commit

609e630

verified ·

1 Parent(s): ad22150

Update index.html

Browse files

Files changed (1) hide show

index.html +1 -64

index.html CHANGED Viewed

@@ -56,69 +56,6 @@
             This task helps diagnose and localize pronunciation errors, enabling educational feedback in applications like Qur’anic tutoring or speech evaluation tools.
         </p>
-        <!-- <h2>Task Description</h2>
-        <p>
-            The Iqra’Eval shared task focuses on automatic mispronunciation detection and diagnosis in Qur’anic recitation. Given:
-        </p>
-        <ol>
-            <li>A speech segment (an audio clip of a Qur’anic verse recitation), and</li>
-            <li>A fully vowelized reference transcript (the corresponding Qur’anic text, fully diacritized),</li>
-        </ol>
-        <p>
-            the goal is to identify any pronunciation errors, localize them within the phoneme sequence, and classify the type of error based on Tajweed rules.
-        </p>
-        <p>
-            Each participant’s system must predict the sequence of phonemes that the reciter actually produced. A standardized phonemizer (Nawar Halabi’s phonetizer) will be used to generate the “gold” phoneme sequence from the reference transcript for comparison.
-        </p>
-        <p>
-            <strong>Key subtasks:</strong>
-        </p>
-        <ul>
-            <li>Compare predicted phoneme sequence vs. gold reference.</li>
-            <li>Detect substitutions (e.g., pronouncing /q/ as /k/), deletions (e.g., dropping a hamza), or insertions (e.g., adding an extra vowel) of phonemes.</li>
-            <li>Localize the error to a specific phoneme index in the utterance.</li>
-            <li>Classify what type of mistake occurred based on Tajweed (e.g., madd errors, ikhfa, idgham, etc.).</li>
-        </ul> -->
-        <!-- Example & Illustration -->
-        <!-- <h2>Example</h2>
-        <p>
-            Suppose the reference verse (fully vowelized) is:
-        </p>
-        <blockquote>
-            <p>
-                إِنَّ اللَّهَ عَلَىٰ كُلِّ شَيْءٍ قَدِيرٌ
-                <br />
-                (inna l-lāha ʿalā kulli shay’in qadīrun)
-            </p>
-        </blockquote>
-        <p>
-            The gold phoneme sequence (using the standard phonemizer) might be:
-        </p>
-        <pre>
-inna l l aa h a  ʕ a l a  k u l l i  ʃ a y ’ i n  q a d i r u n
-        </pre>
-        <p>
-            If a reciter mispronounces “قَدِيرٌ” (qadīrun) as “كَدِيرٌ” (kadīrun), that corresponds to a substitution at the very start of that word: phoneme /q/ → /k/.
-        </p>
-        <p>
-            A well-trained system should:
-        </p>
-        <ol>
-            <li>Flag the pronunciation of “قَدِيرٌ” as erroneous,</li>
-            <li>Identify that the first phoneme in that word was substituted (“/q/” → “/k/”), and</li>
-            <li>Classify it under the Tajweed error category “Ghunnah/Qaf vs. Kaf error.”</li>
-        </ol>
-        <div style="text-align: center; margin: 1em 0;">
-            <img src="images/pronunciation_assessment_arabic.png" alt="Pronunciation Assessment in Arabic" style="max-width: 100%; height: auto;" />
-            <p style="font-size: 0.9em; color: #555;">
-                <em>Figure: Example of a phoneme-level comparison between reference vs. predicted for an Arabic Qur’anic recitation.</em>
-            </p>
-        </div> -->
-        <!-- Evaluation Criteria -->
-        <!-- Dataset Description -->
         <h2>Dataset Description</h2>
         <p>
             All data are hosted on Hugging Face. Two main splits are provided:
@@ -163,7 +100,7 @@ inna l l aa h a  ʕ a l a  k u l l i  ʃ a y ’ i n  q a d i r u n
       <h2>Test Data QuranMB</h2>
         <p>
-          To construct a reliable test set, we select 98 verses from the Qur’an, which are read aloud by 18 native Arabic speakers (14 females, 4 males), resulting in approximately 2 hours of recorded speech. The speakers were instructed to read the text in MSA at their normal tempo, disregarding Qur’anic tajweed rules, while deliberately producing the specified pronunciation errors. To ensure consistency in error production, we developed a custom recording tool that highlighted the modified text and displayed additional instructions specifying the type of error (Figure <em>fig:recording</em>). Before recording, speakers were required to silently read each sentence to familiarize themselves with the intended errors before reading them aloud. After recording, three linguistic annotators verified and corrected the transcriptions, and flagged all pronunciation errors for evaluation.
         </p>
         <ul>
           <li><code>df_test = load_dataset("IqraEval/Iqra_QuranMB_v2")</code></li>

             This task helps diagnose and localize pronunciation errors, enabling educational feedback in applications like Qur’anic tutoring or speech evaluation tools.
         </p>
         <h2>Dataset Description</h2>
         <p>
             All data are hosted on Hugging Face. Two main splits are provided:
       <h2>Test Data QuranMB</h2>
         <p>
+          To construct a reliable test set, we select 98 verses from the Qur’an, which are read aloud by 18 native Arabic speakers (14 females, 4 males), resulting in approximately 2 hours of recorded speech. The speakers were instructed to read the text in MSA at their normal tempo, disregarding Qur’anic tajweed rules, while deliberately producing the specified pronunciation errors. To ensure consistency in error production, we developed a custom recording tool that highlighted the modified text and displayed additional instructions specifying the type of error. Before recording, speakers were required to silently read each sentence to familiarize themselves with the intended errors before reading them aloud. After recording, three linguistic annotators verified and corrected the phoneme sequence and flagged all pronunciation errors for evaluation.
         </p>
         <ul>
           <li><code>df_test = load_dataset("IqraEval/Iqra_QuranMB_v2")</code></li>