Spaces:

IqraEval
/

SharedTask_ArabicNLP2025

Running

App Files Files Community

01Yassine commited on 25 days ago

Commit

3fb7e9e

verified ·

1 Parent(s): 668602f

Update index.html

Browse files

Files changed (1) hide show

index.html +59 -59

index.html CHANGED Viewed

@@ -17,21 +17,17 @@
         <!-- Overview Section -->
         <h2>Overview</h2>
         <p>
-            <strong>Iqra’Eval</strong> is a shared task aimed at advancing <strong>automatic assessment of Qur’anic recitation pronunciation</strong> by leveraging computational methods to detect and diagnose pronunciation errors. The focus on Qur’anic recitation provides a standardized and well-defined context for evaluating Modern Standard Arabic (MSA) pronunciation, where precise articulation is not only valued but essential for correctness according to established Tajweed rules.
         </p>
         <p>
-            Participants will develop systems capable of:
         </p>
-        <ul>
-            <li>Detecting whether a segment of Qur’anic recitation contains pronunciation errors.</li>
-            <li>Diagnosing the nature of the error (e.g., substitution, deletion, or insertion of phonemes).</li>
-        </ul>
         <!-- Timeline Section -->
         <h2>Timeline</h2>
         <ul>
             <li><strong>June 1, 2025</strong>: Official announcement of the shared task</li>
-            <li><strong>June 8, 2025</strong>: Release of training data, development set (QuranMB), phonetizer script, and baseline systems</li>
             <li><strong>July 24, 2025</strong>: Registration deadline and release of test data</li>
             <li><strong>July 27, 2025</strong>: End of evaluation cycle (test set submission closes)</li>
             <li><strong>July 30, 2025</strong>: Final results released</li>
@@ -93,27 +89,8 @@
           <p>
             The annotated phoneme sequence indicates that the phoneme <code>ta</code> was omitted, but the model failed to detect it.
           </p>
-          <h2>Potential Research Directions</h2>
-          <ol>
-            <li>
-              <strong>Advanced Mispronunciation Detection Models</strong><br>
-              Apply state-of-the-art self-supervised models (e.g.,
-              <a href="https://arxiv.org/abs/2111.06331" target="_blank">Wav2Vec2.0</a>, HuBERT)
-              pre-trained on Arabic speech. These models can be fine-tuned on Quranic recitations to improve phoneme-level accuracy.
-            </li>
-            <li>
-              <strong>Data Augmentation Strategies</strong><br>
-              Create synthetic mispronunciation examples using pipelines like
-              <a href="https://arxiv.org/abs/2211.00923" target="_blank">SpeechBlender</a>.
-              Augmenting limited Arabic/Quranic speech data helps mitigate data scarcity and improves model robustness.
-            </li>
-            <li>
-              <strong>Analysis of Common Mispronunciation Patterns</strong><br>
-              Perform statistical analysis on the QuranMB dataset to identify prevalent errors (e.g., substituting similar phonemes, swapping vowels).
-              These insights can drive targeted training and tailored feedback rules.
-            </li>
-          </ol>
         <h2>Training Dataset: Description</h2>
         <p>
@@ -189,7 +166,38 @@
                 For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.
             </em>
         </p>
         <h2>Evaluation Criteria</h2>
         <p>
           The primary evaluation metric for the IqraEval system is the <strong>F1-score</strong> at the phoneme level. In addition, we adopt a hierarchical evaluation structure, <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a>, that breaks down performance into detection and diagnostic phases.
@@ -253,37 +261,29 @@
           </ul>
         </p>
-        <!-- Submission Details -->
-        <h2>Submission Details (Draft)</h2>
-        <p>
-            Participants are required to submit a CSV file named <code>submission.csv</code> containing the predicted phoneme sequences for each audio sample. The file must have exactly two columns:
-        </p>
-        <ul>
-            <li><strong>ID:</strong> Unique identifier of the audio sample.</li>
-            <li><strong>Labels:</strong> The predicted phoneme sequence, with each phoneme separated by a single space.</li>
-        </ul>
-        <p>
-            Below is a minimal example illustrating the required format:
-        </p>
-        <pre>
-ID,Labels
-0000_0001, i n n a m a a y a k h a l l a h a m i n ʕ i b a a d i h u l ʕ u l a m
-0000_0002, m a a n a n s a k h u m i n i ʕ a a y a t i n
-0000_0003, y u k h i k u m u n n u ʔ a u ʔ a m a n a t a n m m i n h u
-…
-        </pre>
-        <p>
-            The first column (ID) should match exactly the audio filenames (without extension). The second column (Labels) is the predicted phoneme string.
-        </p>
-        <p>
-            <strong>Important:</strong>
-            <ul>
-                <li>Use UTF-8 encoding.</li>
-                <li>Do not include extra spaces at the start or end of each line.</li>
-                <li>Submit a single CSV file (no archives). Filename must be <code>teamID_submission.csv</code>.</li>
-            </ul>
-        </p>
         <!-- Placeholder for Future Details -->
         <h2>Future Updates</h2>

         <!-- Overview Section -->
         <h2>Overview</h2>
         <p>
+            <strong>Iqra’Eval</strong> is a shared task aimed at advancing <strong>automatic assessment of Qur’anic recitation pronunciation</strong> by leveraging computational methods to detect and diagnose pronunciation errors. The focus on Qur’anic recitation provides a standardized and well-defined context for evaluating Modern Standard Arabic (MSA) pronunciation.
         </p>
         <p>
+            Participants will develop systems capable of Detecting Mispronunciations (e.g., substitution, deletion, or insertion of phonemes).
         </p>
         <!-- Timeline Section -->
         <h2>Timeline</h2>
         <ul>
             <li><strong>June 1, 2025</strong>: Official announcement of the shared task</li>
+            <li><strong>June 10, 2025</strong>: Release of training data, development set (QuranMB), phonetizer script, and baseline systems</li>
             <li><strong>July 24, 2025</strong>: Registration deadline and release of test data</li>
             <li><strong>July 27, 2025</strong>: End of evaluation cycle (test set submission closes)</li>
             <li><strong>July 30, 2025</strong>: Final results released</li>
           <p>
             The annotated phoneme sequence indicates that the phoneme <code>ta</code> was omitted, but the model failed to detect it.
           </p>
         <h2>Training Dataset: Description</h2>
         <p>
                 For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.
             </em>
         </p>
+        <!-- Submission Details -->
+        <h2>Submission Details (Draft)</h2>
+        <p>
+            Participants are required to submit a CSV file named <code>submission.csv</code> containing the predicted phoneme sequences for each audio sample. The file must have exactly two columns:
+        </p>
+        <ul>
+            <li><strong>ID:</strong> Unique identifier of the audio sample.</li>
+            <li><strong>Labels:</strong> The predicted phoneme sequence, with each phoneme separated by a single space.</li>
+        </ul>
+        <p>
+            Below is a minimal example illustrating the required format:
+        </p>
+        <pre>
+ID,Labels
+0000_0001, i n n a m a a y a k h a l l a h a m i n ʕ i b a a d i h u l ʕ u l a m
+0000_0002, m a a n a n s a k h u m i n i ʕ a a y a t i n
+0000_0003, y u k h i k u m u n n u ʔ a u ʔ a m a n a t a n m m i n h u
+…
+        </pre>
+        <p>
+            The first column (ID) should match exactly the audio filenames (without extension). The second column (Labels) is the predicted phoneme string.
+        </p>
+        <p>
+            <strong>Important:</strong>
+            <ul>
+                <li>Use UTF-8 encoding.</li>
+                <li>Do not include extra spaces at the start or end of each line.</li>
+                <li>Submit a single CSV file (no archives). Filename must be <code>teamID_submission.csv</code>.</li>
+            </ul>
+        </p>
         <h2>Evaluation Criteria</h2>
         <p>
           The primary evaluation metric for the IqraEval system is the <strong>F1-score</strong> at the phoneme level. In addition, we adopt a hierarchical evaluation structure, <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a>, that breaks down performance into detection and diagnostic phases.
           </ul>
         </p>
+                <h2>Potential Research Directions</h2>
+          <ol>
+            <li>
+              <strong>Advanced Mispronunciation Detection Models</strong><br>
+              Apply state-of-the-art self-supervised models (e.g.,
+              <a href="https://arxiv.org/abs/2111.06331" target="_blank">Wav2Vec2.0</a>, HuBERT)
+              pre-trained on Arabic speech. These models can be fine-tuned on Quranic recitations to improve phoneme-level accuracy.
+            </li>
+            <li>
+              <strong>Data Augmentation Strategies</strong><br>
+              Create synthetic mispronunciation examples using pipelines like
+              <a href="https://arxiv.org/abs/2211.00923" target="_blank">SpeechBlender</a>.
+              Augmenting limited Arabic/Quranic speech data helps mitigate data scarcity and improves model robustness.
+            </li>
+            <li>
+              <strong>Analysis of Common Mispronunciation Patterns</strong><br>
+              Perform statistical analysis on the QuranMB dataset to identify prevalent errors (e.g., substituting similar phonemes, swapping vowels).
+              These insights can drive targeted training and tailored feedback rules.
+            </li>
+          </ol>
         <!-- Placeholder for Future Details -->
         <h2>Future Updates</h2>