Spaces:

IqraEval
/

SharedTask_ArabicNLP2025

Running

01Yassine commited on Jun 10

Commit

e54837d

verified ·

1 Parent(s): b64074b

Update index.html

Files changed (1) hide show

index.html CHANGED Viewed

@@ -164,6 +164,13 @@
       Here, <code>$</code>→<code>s</code> and <code>i</code>→<code>u</code>; omission of <code>ta</code> went undetected.
     </p>
     <h2>Training Dataset: Description</h2>
     <p>
       Hosted on Hugging Face:

       Here, <code>$</code>→<code>s</code> and <code>i</code>→<code>u</code>; omission of <code>ta</code> went undetected.
     </p>
+    <h2>Phoneme Set Description</h2>
+    <p>
+      The phoneme set used in this work is based on a specialized phonetizer developed for vowelized MSA by Nawar Halabi. It includes a comprehensive range of 68 phonemes designed to capture key phonetic and prosodic features of Qur’an recitation, such as stress, pausing, intonation, emphaticness, and notably, gemination. Gemination—the doubling of consonant sounds—is explicitly represented by duplicating the consonant symbol (e.g., <code>/b/</code> becomes <code>/bb/</code>).
+      While the phonetizer distinguishes vowels following emphatic and non-emphatic consonants, this distinction is merged in our approach to better align with MSA pronunciation norms, where the difference does not affect meaning. This phoneme set provides a detailed yet practical representation of the speech sounds relevant for accurate mispronunciation detection in Qur’anic recitation.
+      For further details, including the full phoneme inventory, see <a href="https://huggingface.co/spaces/IqraEval/ArabicPhoneme">Phoneme Inventory</a>.
+    </p>
     <h2>Training Dataset: Description</h2>
     <p>
       Hosted on Hugging Face: