Spaces:

IqraEval
/

SharedTask_ArabicNLP2025

Running

SharedTask_ArabicNLP2025

File size: 7,787 Bytes

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width" />
<title>Iqra’Eval Shared Task</title>
<style>
  /* Color Palette */
  :root {
    --navy-blue: #001f4d;
    --coral: #ff6f61;
    --light-gray: #f5f7fa;
    --text-dark: #222;
  }

  body {
    font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
    background-color: var(--light-gray);
    color: var(--text-dark);
    margin: 20px;
    line-height: 1.6;
  }

  h1, h2, h3 {
    color: var(--navy-blue);
    font-weight: 700;
    margin-top: 1.2em;
  }

  h1 {
    text-align: center;
    font-size: 2.8rem;
    margin-bottom: 0.3em;
  }

  h2 {
    border-bottom: 3px solid var(--coral);
    padding-bottom: 0.3em;
  }

  h3 {
    color: var(--coral);
    margin-top: 1em;
  }

  p {
    max-width: 900px;
    margin: 0.8em auto;
  }

  strong {
    color: var(--navy-blue);
  }

  ul {
    max-width: 900px;
    margin: 0.5em auto 1.5em auto;
    padding-left: 1.2em;
  }

  ul li {
    margin: 0.4em 0;
  }

  code {
    background-color: #eef4f8;
    color: var(--navy-blue);
    padding: 2px 6px;
    border-radius: 4px;
    font-family: Consolas, monospace;
    font-size: 0.9em;
  }

  pre {
    max-width: 900px;
    background-color: #eef4f8;
    color: var(--navy-blue);
    padding: 1em;
    border-radius: 8px;
    overflow-x: auto;
    font-family: Consolas, monospace;
    font-size: 0.95em;
    margin: 0.8em auto;
  }

  a {
    color: var(--coral);
    text-decoration: none;
  }

  a:hover {
    text-decoration: underline;
  }

  .card {
    max-width: 1200px;
    background: white;
    margin: 0 auto 40px auto;
    padding: 2em 2.5em;
    box-shadow: 0 4px 14px rgba(0,0,0,0.1);
    border-radius: 12px;
  }

  /* Centering images and captions */
  div img {
    display: block;
    margin: 20px auto;
    max-width: 100%;
    height: auto;
    border-radius: 8px;
    box-shadow: 0 4px 8px rgba(0,31,77,0.15);
  }

  .centered p {
    text-align: center;
    font-style: italic;
    color: var(--navy-blue);
    margin-top: 0.4em;
  }

  .highlight {
    color: var(--coral);
    font-weight: 700;
  }

  /* Lists inside paragraphs */
  p > ul {
    margin-top: 0.3em;
  }

</style>
</head>
<body>
<div class="card">
    <h1>Iqra’Eval Shared Task</h1>

    <div>
      <img src="IqraEval.png" alt="IqraEval Logo" />
    </div>

    <!-- Overview Section -->
    <h2>Overview</h2>
    <p>
        <strong>Iqra'Eval</strong> is a shared task aimed at advancing <strong>automatic assessment of Qur’anic recitation pronunciation</strong> by leveraging computational methods to detect and diagnose pronunciation errors. The focus on Qur’anic recitation provides a standardized and well-defined context for evaluating Modern Standard Arabic (MSA) pronunciation.
    </p>
    <p>
        Participants will develop systems capable of detecting mispronunciations (e.g., substitution, deletion, or insertion of phonemes).
    </p>

    <!-- Timeline Section -->
    <h2>Timeline</h2>
    <ul>
        <li><strong>June 1, 2025</strong>: Official announcement of the shared task</li>
        <li><strong>June 10, 2025</strong>: Release of training data, development set (QuranMB), phonetizer script, and baseline systems</li>
        <li><strong>July 24, 2025</strong>: Registration deadline and release of test data</li>
        <li><strong>July 27, 2025</strong>: End of evaluation cycle (test set submission closes)</li>
        <li><strong>July 30, 2025</strong>: Final results released</li>
        <li><strong>August 15, 2025</strong>: System description paper submissions due</li>
        <li><strong>August 22, 2025</strong>: Notification of acceptance</li>
        <li><strong>September 5, 2025</strong>: Camera-ready versions due</li>
    </ul>

    <!-- Task Description -->
    <h2>Task Description: Quranic Mispronunciation Detection System</h2>
    <p>
      The aim is to design a model to detect and provide detailed feedback on mispronunciations in Quranic recitations. 
      Users read aloud vowelized Quranic verses; this model predicts the phoneme sequence uttered by the speaker, which may contain mispronunciations. 
      Models are evaluated on the <strong>QuranMB.v2</strong> dataset, which contains human‐annotated mispronunciations.
    </p>

    <div class="centered">
      <img src="task.png" alt="System Overview" />
      <p>Figure: Overview of the Mispronunciation Detection Workflow</p>
    </div>

    <h3>1. Read the Verse</h3>
    <p>
      The user is shown a <strong>Reference Verse</strong> (What should have been said) in Arabic script along with its corresponding <strong>Reference Phoneme Sequence</strong>.
    </p>
    <p><strong>Example:</strong></p>
    <ul>
      <li><strong>Arabic:</strong> إِنَّ الصَّفَا وَالْمَرْوَةَ مِنْ شَعَائِرِ اللَّهِ</li>
      <li>
        <strong>Phoneme:</strong> 
        <code>&lt; i n n a SS A f aa w a l m a r w a t a m i n $ a E a a &lt; i r i l l a h i</code>
      </li>
    </ul>

    <h3>2. Save Recording</h3>
    <p>
      The user recites the verse aloud; the system captures and stores the audio waveform for subsequent analysis.
    </p>

    <h3>3. Mispronunciation Detection</h3>
    <p>
      The stored audio is fed into a <strong>Mispronunciation Detection Model</strong>. 
      This model predicts the phoneme sequence uttered by the speaker, which may contain mispronunciations.
    </p>
    <p><strong>Example of Mispronunciation:</strong></p>
    <ul>
      <li><strong>Reference Phoneme Sequence (What should have been said):</strong> <code>&lt; i n n a SS A f aa w a l m a r w a t a m i n $ a E a a &lt; i r i l l a h i</code></li>
      <li><strong>Model Phoneme Prediction (What is predicted):</strong> <code>&lt; i n n a SS A f aa w a l m a r w a t a m i n s a E a a &lt; i r u l l a h i</code></li>
      <li>
        <strong>Annotated Phoneme Sequence (What is said):</strong> 
        <code>&lt; i n n a SS A f aa w a l m a r w <span class="highlight">s</span> a E a a &lt; i <span class="highlight">r u</span> l l a h i</code>
      </li>
    </ul>
    <p>
      In this case, the phoneme <code>$</code> was mispronounced as <code>s</code>, and <code>i</code> was mispronounced as <code>u</code>.
    </p>
    <p>
      The annotated phoneme sequence indicates that the phoneme <code>ta</code> was omitted, but the model failed to detect it.
    </p>

    <h2>Training Dataset: Description</h2>
    <p>
        All data are hosted on Hugging Face. Two main splits are provided:
    </p>
    <ul>
        <li>
            <strong>Training set:</strong> 79 hours of Modern Standard Arabic (MSA) Quran recitations (5,167 audio files)
        </li>
        <li>
            <strong>Evaluation set:</strong> QuranMB.v2 dataset with phoneme-level mispronunciation annotations, which includes:
            <ul>
                <li>QuranMB-Train: 9 hours (1,218 files) for development</li>
                <li>QuranMB-Test: 8 hours (1,018 files) for evaluation</li>
            </ul>
        </li>
    </ul>

    <h2>Submission Guidelines</h2>
    <p>
      Participants should submit their predicted phoneme sequences on the test set by the deadline (July 27, 2025). Submissions will be automatically evaluated using the official scoring scripts.
    </p>

    <h2>Evaluation Metrics</h2>
    <p>
      Systems will be evaluated based on phoneme error rates (PER) computed over the test set, measuring accuracy in detecting and localizing mispronunciations.
    </p>

    <h2>Contact and Support</h2>
    <p>
      For inquiries and support, reach out to the task coordinators at 
      <a href="mailto:support@iqraeval.org">support@iqraeval.org</a>.
    </p>

</div>
</body>
</html>