01Yassine commited on
Commit
3fb7e9e
·
verified ·
1 Parent(s): 668602f

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +59 -59
index.html CHANGED
@@ -17,21 +17,17 @@
17
  <!-- Overview Section -->
18
  <h2>Overview</h2>
19
  <p>
20
- <strong>Iqra’Eval</strong> is a shared task aimed at advancing <strong>automatic assessment of Qur’anic recitation pronunciation</strong> by leveraging computational methods to detect and diagnose pronunciation errors. The focus on Qur’anic recitation provides a standardized and well-defined context for evaluating Modern Standard Arabic (MSA) pronunciation, where precise articulation is not only valued but essential for correctness according to established Tajweed rules.
21
  </p>
22
  <p>
23
- Participants will develop systems capable of:
24
  </p>
25
- <ul>
26
- <li>Detecting whether a segment of Qur’anic recitation contains pronunciation errors.</li>
27
- <li>Diagnosing the nature of the error (e.g., substitution, deletion, or insertion of phonemes).</li>
28
- </ul>
29
 
30
  <!-- Timeline Section -->
31
  <h2>Timeline</h2>
32
  <ul>
33
  <li><strong>June 1, 2025</strong>: Official announcement of the shared task</li>
34
- <li><strong>June 8, 2025</strong>: Release of training data, development set (QuranMB), phonetizer script, and baseline systems</li>
35
  <li><strong>July 24, 2025</strong>: Registration deadline and release of test data</li>
36
  <li><strong>July 27, 2025</strong>: End of evaluation cycle (test set submission closes)</li>
37
  <li><strong>July 30, 2025</strong>: Final results released</li>
@@ -93,27 +89,8 @@
93
  <p>
94
  The annotated phoneme sequence indicates that the phoneme <code>ta</code> was omitted, but the model failed to detect it.
95
  </p>
96
-
97
- <h2>Potential Research Directions</h2>
98
- <ol>
99
- <li>
100
- <strong>Advanced Mispronunciation Detection Models</strong><br>
101
- Apply state-of-the-art self-supervised models (e.g.,
102
- <a href="https://arxiv.org/abs/2111.06331" target="_blank">Wav2Vec2.0</a>, HuBERT)
103
- pre-trained on Arabic speech. These models can be fine-tuned on Quranic recitations to improve phoneme-level accuracy.
104
- </li>
105
- <li>
106
- <strong>Data Augmentation Strategies</strong><br>
107
- Create synthetic mispronunciation examples using pipelines like
108
- <a href="https://arxiv.org/abs/2211.00923" target="_blank">SpeechBlender</a>.
109
- Augmenting limited Arabic/Quranic speech data helps mitigate data scarcity and improves model robustness.
110
- </li>
111
- <li>
112
- <strong>Analysis of Common Mispronunciation Patterns</strong><br>
113
- Perform statistical analysis on the QuranMB dataset to identify prevalent errors (e.g., substituting similar phonemes, swapping vowels).
114
- These insights can drive targeted training and tailored feedback rules.
115
- </li>
116
- </ol>
117
 
118
  <h2>Training Dataset: Description</h2>
119
  <p>
@@ -189,7 +166,38 @@
189
  For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.
190
  </em>
191
  </p>
192
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
193
  <h2>Evaluation Criteria</h2>
194
  <p>
195
  The primary evaluation metric for the IqraEval system is the <strong>F1-score</strong> at the phoneme level. In addition, we adopt a hierarchical evaluation structure, <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a>, that breaks down performance into detection and diagnostic phases.
@@ -253,37 +261,29 @@
253
  </ul>
254
  </p>
255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
256
 
257
- <!-- Submission Details -->
258
- <h2>Submission Details (Draft)</h2>
259
- <p>
260
- Participants are required to submit a CSV file named <code>submission.csv</code> containing the predicted phoneme sequences for each audio sample. The file must have exactly two columns:
261
- </p>
262
- <ul>
263
- <li><strong>ID:</strong> Unique identifier of the audio sample.</li>
264
- <li><strong>Labels:</strong> The predicted phoneme sequence, with each phoneme separated by a single space.</li>
265
- </ul>
266
- <p>
267
- Below is a minimal example illustrating the required format:
268
- </p>
269
- <pre>
270
- ID,Labels
271
- 0000_0001, i n n a m a a y a k h a l l a h a m i n ʕ i b a a d i h u l ʕ u l a m
272
- 0000_0002, m a a n a n s a k h u m i n i ʕ a a y a t i n
273
- 0000_0003, y u k h i k u m u n n u ʔ a u ʔ a m a n a t a n m m i n h u
274
-
275
- </pre>
276
- <p>
277
- The first column (ID) should match exactly the audio filenames (without extension). The second column (Labels) is the predicted phoneme string.
278
- </p>
279
- <p>
280
- <strong>Important:</strong>
281
- <ul>
282
- <li>Use UTF-8 encoding.</li>
283
- <li>Do not include extra spaces at the start or end of each line.</li>
284
- <li>Submit a single CSV file (no archives). Filename must be <code>teamID_submission.csv</code>.</li>
285
- </ul>
286
- </p>
287
 
288
  <!-- Placeholder for Future Details -->
289
  <h2>Future Updates</h2>
 
17
  <!-- Overview Section -->
18
  <h2>Overview</h2>
19
  <p>
20
+ <strong>Iqra’Eval</strong> is a shared task aimed at advancing <strong>automatic assessment of Qur’anic recitation pronunciation</strong> by leveraging computational methods to detect and diagnose pronunciation errors. The focus on Qur’anic recitation provides a standardized and well-defined context for evaluating Modern Standard Arabic (MSA) pronunciation.
21
  </p>
22
  <p>
23
+ Participants will develop systems capable of Detecting Mispronunciations (e.g., substitution, deletion, or insertion of phonemes).
24
  </p>
 
 
 
 
25
 
26
  <!-- Timeline Section -->
27
  <h2>Timeline</h2>
28
  <ul>
29
  <li><strong>June 1, 2025</strong>: Official announcement of the shared task</li>
30
+ <li><strong>June 10, 2025</strong>: Release of training data, development set (QuranMB), phonetizer script, and baseline systems</li>
31
  <li><strong>July 24, 2025</strong>: Registration deadline and release of test data</li>
32
  <li><strong>July 27, 2025</strong>: End of evaluation cycle (test set submission closes)</li>
33
  <li><strong>July 30, 2025</strong>: Final results released</li>
 
89
  <p>
90
  The annotated phoneme sequence indicates that the phoneme <code>ta</code> was omitted, but the model failed to detect it.
91
  </p>
92
+
93
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
  <h2>Training Dataset: Description</h2>
96
  <p>
 
166
  For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.
167
  </em>
168
  </p>
169
+
170
+ <!-- Submission Details -->
171
+ <h2>Submission Details (Draft)</h2>
172
+ <p>
173
+ Participants are required to submit a CSV file named <code>submission.csv</code> containing the predicted phoneme sequences for each audio sample. The file must have exactly two columns:
174
+ </p>
175
+ <ul>
176
+ <li><strong>ID:</strong> Unique identifier of the audio sample.</li>
177
+ <li><strong>Labels:</strong> The predicted phoneme sequence, with each phoneme separated by a single space.</li>
178
+ </ul>
179
+ <p>
180
+ Below is a minimal example illustrating the required format:
181
+ </p>
182
+ <pre>
183
+ ID,Labels
184
+ 0000_0001, i n n a m a a y a k h a l l a h a m i n ʕ i b a a d i h u l ʕ u l a m
185
+ 0000_0002, m a a n a n s a k h u m i n i ʕ a a y a t i n
186
+ 0000_0003, y u k h i k u m u n n u ʔ a u ʔ a m a n a t a n m m i n h u
187
+
188
+ </pre>
189
+ <p>
190
+ The first column (ID) should match exactly the audio filenames (without extension). The second column (Labels) is the predicted phoneme string.
191
+ </p>
192
+ <p>
193
+ <strong>Important:</strong>
194
+ <ul>
195
+ <li>Use UTF-8 encoding.</li>
196
+ <li>Do not include extra spaces at the start or end of each line.</li>
197
+ <li>Submit a single CSV file (no archives). Filename must be <code>teamID_submission.csv</code>.</li>
198
+ </ul>
199
+ </p>
200
+
201
  <h2>Evaluation Criteria</h2>
202
  <p>
203
  The primary evaluation metric for the IqraEval system is the <strong>F1-score</strong> at the phoneme level. In addition, we adopt a hierarchical evaluation structure, <a href="https://arxiv.org/pdf/2310.13974" target="_blank">MDD Overview</a>, that breaks down performance into detection and diagnostic phases.
 
261
  </ul>
262
  </p>
263
 
264
+
265
+ <h2>Potential Research Directions</h2>
266
+ <ol>
267
+ <li>
268
+ <strong>Advanced Mispronunciation Detection Models</strong><br>
269
+ Apply state-of-the-art self-supervised models (e.g.,
270
+ <a href="https://arxiv.org/abs/2111.06331" target="_blank">Wav2Vec2.0</a>, HuBERT)
271
+ pre-trained on Arabic speech. These models can be fine-tuned on Quranic recitations to improve phoneme-level accuracy.
272
+ </li>
273
+ <li>
274
+ <strong>Data Augmentation Strategies</strong><br>
275
+ Create synthetic mispronunciation examples using pipelines like
276
+ <a href="https://arxiv.org/abs/2211.00923" target="_blank">SpeechBlender</a>.
277
+ Augmenting limited Arabic/Quranic speech data helps mitigate data scarcity and improves model robustness.
278
+ </li>
279
+ <li>
280
+ <strong>Analysis of Common Mispronunciation Patterns</strong><br>
281
+ Perform statistical analysis on the QuranMB dataset to identify prevalent errors (e.g., substituting similar phonemes, swapping vowels).
282
+ These insights can drive targeted training and tailored feedback rules.
283
+ </li>
284
+ </ol>
285
+
286
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
287
 
288
  <!-- Placeholder for Future Details -->
289
  <h2>Future Updates</h2>