01Yassine commited on
Commit
0469dbf
·
verified ·
1 Parent(s): 18d0f01

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +66 -10
index.html CHANGED
@@ -57,7 +57,7 @@
57
 
58
  <h3>1. Read the Verse</h3>
59
  <p>
60
- The user is shown a <strong>Reference Verse</strong> in Arabic script along with its corresponding <strong>Reference Phoneme Sequence</strong>.
61
  </p>
62
  <p><strong>Example:</strong></p>
63
  <ul>
@@ -80,10 +80,10 @@
80
  </p>
81
  <p><strong>Example of Mispronunciation:</strong></p>
82
  <ul>
83
- <li><strong>Reference Phoneme Sequence:</strong> <code>&lt; i n n a SS A f aa w a l m a r w a t a m i n $ a E a a &lt; i r i l l a h i</code></li>
84
- <li><strong>Model Phoneme Prediction:</strong> <code>&lt; i n n a SS A f aa w a l m a r w a t a m i n s a E a a &lt; i r u l l a h i</code></li>
85
  <li>
86
- <strong>Annotated Phoneme Sequence:</strong>
87
  <code>&lt; i n n a SS A f aa w a l m a r w a m i n <span class="highlight">s</span> a E a a &lt; i <span class="highlight">r u</span> l l a h i</code>
88
  </li>
89
  </ul>
@@ -189,18 +189,74 @@
189
  For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.
190
  </em>
191
  </p>
192
-
193
- <h2>Evaluation Criteria</h2>
 
 
 
194
  <p>
195
- Systems will be scored on their ability to detect and correctly classify phoneme-level errors:
 
 
 
 
 
 
 
196
  </p>
197
  <ul>
198
- <li><strong>Detection accuracy:</strong> Did the system spot that a phoneme-level error occurred in the segment?</li>
199
- <li><strong>Classification F1-score:</strong> Mispronunciation Detection F1-score</li>
 
 
 
 
 
 
 
 
 
 
 
200
  </ul>
201
  <p>
202
- <em>(Detailed evaluation weights and scripts will be made available on June 5, 2025.)</em>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
203
  </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204
 
205
  <!-- Submission Details -->
206
  <h2>Submission Details (Draft)</h2>
 
57
 
58
  <h3>1. Read the Verse</h3>
59
  <p>
60
+ The user is shown a <strong>Reference Verse</strong> (What should have been said) in Arabic script along with its corresponding <strong>Reference Phoneme Sequence</strong>.
61
  </p>
62
  <p><strong>Example:</strong></p>
63
  <ul>
 
80
  </p>
81
  <p><strong>Example of Mispronunciation:</strong></p>
82
  <ul>
83
+ <li><strong>Reference Phoneme Sequence (What should have been said):</strong> <code>&lt; i n n a SS A f aa w a l m a r w a t a m i n $ a E a a &lt; i r i l l a h i</code></li>
84
+ <li><strong>Model Phoneme Prediction (What is predicted):</strong> <code>&lt; i n n a SS A f aa w a l m a r w a t a m i n s a E a a &lt; i r u l l a h i</code></li>
85
  <li>
86
+ <strong>Annotated Phoneme Sequence (What is said):</strong>
87
  <code>&lt; i n n a SS A f aa w a l m a r w a m i n <span class="highlight">s</span> a E a a &lt; i <span class="highlight">r u</span> l l a h i</code>
88
  </li>
89
  </ul>
 
189
  For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.
190
  </em>
191
  </p>
192
+
193
+ <h2>Evaluation Criteria</h2>
194
+ <p>
195
+ The primary evaluation metric for the IqraEval system is the <strong>F1-score</strong> at the phoneme level. In addition, we adopt a hierarchical evaluation structure that breaks down performance into detection and diagnostic phases.
196
+ </p>
197
  <p>
198
+ <strong>Hierarchical Evaluation Structure:</strong>
199
+ The hierarchical mispronunciation detection process relies on three sequences:
200
+ <ul>
201
+ <li><em>What is said</em> (the <strong>annotated phoneme sequence</strong> from human annotation),</li>
202
+ <li><em>What is predicted</em> (the <strong>model’s phoneme output</strong>),</li>
203
+ <li><em>What should have been said</em> (the <strong>reference phoneme sequence</strong>).</li>
204
+ </ul>
205
+ By comparing these three sequences, we compute the following counts:
206
  </p>
207
  <ul>
208
+ <li><strong>True Acceptance (TA):</strong>
209
+ Number of phonemes that are annotated as correct and also recognized as correct by the model.
210
+ </li>
211
+ <li><strong>True Rejection (TR):</strong>
212
+ Number of phonemes that are annotated as mispronunciations and correctly predicted as mispronunciations.
213
+ (These labels are further used to measure diagnostic errors by comparing the prediction to the canonical reference.)
214
+ </li>
215
+ <li><strong>False Rejection (FR):</strong>
216
+ Number of phonemes that are annotated as correct but wrongly predicted as mispronunciations.
217
+ </li>
218
+ <li><strong>False Acceptance (FA):</strong>
219
+ Number of phonemes that are annotated as mispronunciations but misclassified as correct pronunciations.
220
+ </li>
221
  </ul>
222
  <p>
223
+ From these counts, we derive three rates:
224
+ <ul>
225
+ <li><strong>False Rejection Rate (FRR):</strong>
226
+ \( \displaystyle \text{FRR} = \frac{\text{FR}}{\text{TA} + \text{FR}} \)
227
+ (Proportion of correctly pronounced phonemes that were mistakenly flagged as errors.)
228
+ </li>
229
+ <li><strong>False Acceptance Rate (FAR):</strong>
230
+ \( \displaystyle \text{FAR} = \frac{\text{FA}}{\text{FA} + \text{TR}} \)
231
+ (Proportion of mispronounced phonemes that were mistakenly classified as correct.)
232
+ </li>
233
+ <li><strong>Diagnostic Error Rate (DER):</strong>
234
+ \( \displaystyle \text{DER} = \frac{\text{DE}}{\text{CD} + \text{DE}} \)
235
+ where DE is the number of misdiagnosed phonemes and CD is the number of correctly diagnosed ones.
236
+ </li>
237
+ </ul>
238
  </p>
239
+ <p>
240
+ In addition to these hierarchical measures, we compute the standard <strong>Precision</strong>, <strong>Recall</strong>, and <strong>F-measure</strong> for mispronunciation detection:
241
+ <ul>
242
+ <li><strong>Precision:</strong>
243
+ \( \displaystyle \text{Precision} = \frac{\text{TR}}{\text{TR} + \text{FR}} \)
244
+ (Of all phonemes predicted as mispronounced, how many were actually mispronounced?)
245
+ </li>
246
+ <li><strong>Recall:</strong>
247
+ \( \displaystyle \text{Recall} = \frac{\text{TR}}{\text{TR} + \text{FA}} \;=\; 1 - \text{FAR} \)
248
+ (Of all truly mispronounced phonemes, how many did we correctly detect?)
249
+ </li>
250
+ <li><strong>F-measure (F1):</strong>
251
+ \( \displaystyle F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \)
252
+ (Harmonic mean of Precision and Recall.)
253
+ </li>
254
+ </ul>
255
+ </p>
256
+ <p>
257
+ <em>(Detailed evaluation weights and scripts will be made available on June 5, 2025.)</em>
258
+ </p>
259
+
260
 
261
  <!-- Submission Details -->
262
  <h2>Submission Details (Draft)</h2>