Spaces:

IqraEval
/

SharedTask_ArabicNLP2025

Running

App Files Files Community

01Yassine commited on Jun 5

Commit

9847294

verified ·

1 Parent(s): 5756d9d

add QuranMB

Browse files

Files changed (1) hide show

index.html +9 -1

index.html CHANGED Viewed

@@ -27,7 +27,7 @@
         <h2>Timeline</h2>
         <ul>
             <li><strong>June 1, 2025</strong>: Official announcement of the shared task</li>
-            <li><strong>June 5, 2025</strong>: Release of training data, development set (QuranMB), phonetizer script, and baseline systems</li>
             <li><strong>July 24, 2025</strong>: Registration deadline and release of test data</li>
             <li><strong>July 27, 2025</strong>: End of evaluation cycle (test set submission closes)</li>
             <li><strong>July 30, 2025</strong>: Final results released</li>
@@ -161,6 +161,14 @@ inna l l aa h a  ʕ a l a  k u l l i  ʃ a y ’ i n  q a d i r u n
             <li><code>df_tts = load_dataset("IqraEval/Iqra_TTS")</code></li>
         </ul>
         <!-- Resources & Links -->
         <h2>Resources</h2>
         <ul>

         <h2>Timeline</h2>
         <ul>
             <li><strong>June 1, 2025</strong>: Official announcement of the shared task</li>
+            <li><strong>June 8, 2025</strong>: Release of training data, development set (QuranMB), phonetizer script, and baseline systems</li>
             <li><strong>July 24, 2025</strong>: Registration deadline and release of test data</li>
             <li><strong>July 27, 2025</strong>: End of evaluation cycle (test set submission closes)</li>
             <li><strong>July 30, 2025</strong>: Final results released</li>
             <li><code>df_tts = load_dataset("IqraEval/Iqra_TTS")</code></li>
         </ul>
+      <h2>Test Data QuranMB</h2>
+        <p>
+          To construct a reliable test set, we select 98 verses from the Qur’an, which are read aloud by 18 native Arabic speakers (14 females, 4 males), resulting in approximately 2 hours of recorded speech. The speakers were instructed to read the text in MSA at their normal tempo, disregarding Qur’anic tajweed rules, while deliberately producing the specified pronunciation errors. To ensure consistency in error production, we developed a custom recording tool that highlighted the modified text and displayed additional instructions specifying the type of error (Figure <em>fig:recording</em>). Before recording, speakers were required to silently read each sentence to familiarize themselves with the intended errors before reading them aloud. After recording, three linguistic annotators verified and corrected the transcriptions, and flagged all pronunciation errors for evaluation.
+        </p>
+        <ul>
+          <li><code>df_test = load_dataset("IqraEval/Iqra_QuranMB_v2")</code></li>
+        </ul>
         <!-- Resources & Links -->
         <h2>Resources</h2>
         <ul>