Update index.html
Browse files- index.html +66 -10
index.html
CHANGED
@@ -57,7 +57,7 @@
|
|
57 |
|
58 |
<h3>1. Read the Verse</h3>
|
59 |
<p>
|
60 |
-
The user is shown a <strong>Reference Verse</strong> in Arabic script along with its corresponding <strong>Reference Phoneme Sequence</strong>.
|
61 |
</p>
|
62 |
<p><strong>Example:</strong></p>
|
63 |
<ul>
|
@@ -80,10 +80,10 @@
|
|
80 |
</p>
|
81 |
<p><strong>Example of Mispronunciation:</strong></p>
|
82 |
<ul>
|
83 |
-
<li><strong>Reference Phoneme Sequence:</strong> <code>< i n n a SS A f aa w a l m a r w a t a m i n $ a E a a < i r i l l a h i</code></li>
|
84 |
-
<li><strong>Model Phoneme Prediction:</strong> <code>< i n n a SS A f aa w a l m a r w a t a m i n s a E a a < i r u l l a h i</code></li>
|
85 |
<li>
|
86 |
-
<strong>Annotated Phoneme Sequence:</strong>
|
87 |
<code>< i n n a SS A f aa w a l m a r w a m i n <span class="highlight">s</span> a E a a < i <span class="highlight">r u</span> l l a h i</code>
|
88 |
</li>
|
89 |
</ul>
|
@@ -189,18 +189,74 @@
|
|
189 |
For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.
|
190 |
</em>
|
191 |
</p>
|
192 |
-
|
193 |
-
|
|
|
|
|
|
|
194 |
<p>
|
195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
196 |
</p>
|
197 |
<ul>
|
198 |
-
|
199 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
200 |
</ul>
|
201 |
<p>
|
202 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
203 |
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
204 |
|
205 |
<!-- Submission Details -->
|
206 |
<h2>Submission Details (Draft)</h2>
|
|
|
57 |
|
58 |
<h3>1. Read the Verse</h3>
|
59 |
<p>
|
60 |
+
The user is shown a <strong>Reference Verse</strong> (What should have been said) in Arabic script along with its corresponding <strong>Reference Phoneme Sequence</strong>.
|
61 |
</p>
|
62 |
<p><strong>Example:</strong></p>
|
63 |
<ul>
|
|
|
80 |
</p>
|
81 |
<p><strong>Example of Mispronunciation:</strong></p>
|
82 |
<ul>
|
83 |
+
<li><strong>Reference Phoneme Sequence (What should have been said):</strong> <code>< i n n a SS A f aa w a l m a r w a t a m i n $ a E a a < i r i l l a h i</code></li>
|
84 |
+
<li><strong>Model Phoneme Prediction (What is predicted):</strong> <code>< i n n a SS A f aa w a l m a r w a t a m i n s a E a a < i r u l l a h i</code></li>
|
85 |
<li>
|
86 |
+
<strong>Annotated Phoneme Sequence (What is said):</strong>
|
87 |
<code>< i n n a SS A f aa w a l m a r w a m i n <span class="highlight">s</span> a E a a < i <span class="highlight">r u</span> l l a h i</code>
|
88 |
</li>
|
89 |
</ul>
|
|
|
189 |
For detailed instructions on data access, phonetizer installation, and baseline usage, please refer to the GitHub README.
|
190 |
</em>
|
191 |
</p>
|
192 |
+
|
193 |
+
<h2>Evaluation Criteria</h2>
|
194 |
+
<p>
|
195 |
+
The primary evaluation metric for the IqraEval system is the <strong>F1-score</strong> at the phoneme level. In addition, we adopt a hierarchical evaluation structure that breaks down performance into detection and diagnostic phases.
|
196 |
+
</p>
|
197 |
<p>
|
198 |
+
<strong>Hierarchical Evaluation Structure:</strong>
|
199 |
+
The hierarchical mispronunciation detection process relies on three sequences:
|
200 |
+
<ul>
|
201 |
+
<li><em>What is said</em> (the <strong>annotated phoneme sequence</strong> from human annotation),</li>
|
202 |
+
<li><em>What is predicted</em> (the <strong>model’s phoneme output</strong>),</li>
|
203 |
+
<li><em>What should have been said</em> (the <strong>reference phoneme sequence</strong>).</li>
|
204 |
+
</ul>
|
205 |
+
By comparing these three sequences, we compute the following counts:
|
206 |
</p>
|
207 |
<ul>
|
208 |
+
<li><strong>True Acceptance (TA):</strong>
|
209 |
+
Number of phonemes that are annotated as correct and also recognized as correct by the model.
|
210 |
+
</li>
|
211 |
+
<li><strong>True Rejection (TR):</strong>
|
212 |
+
Number of phonemes that are annotated as mispronunciations and correctly predicted as mispronunciations.
|
213 |
+
(These labels are further used to measure diagnostic errors by comparing the prediction to the canonical reference.)
|
214 |
+
</li>
|
215 |
+
<li><strong>False Rejection (FR):</strong>
|
216 |
+
Number of phonemes that are annotated as correct but wrongly predicted as mispronunciations.
|
217 |
+
</li>
|
218 |
+
<li><strong>False Acceptance (FA):</strong>
|
219 |
+
Number of phonemes that are annotated as mispronunciations but misclassified as correct pronunciations.
|
220 |
+
</li>
|
221 |
</ul>
|
222 |
<p>
|
223 |
+
From these counts, we derive three rates:
|
224 |
+
<ul>
|
225 |
+
<li><strong>False Rejection Rate (FRR):</strong>
|
226 |
+
\( \displaystyle \text{FRR} = \frac{\text{FR}}{\text{TA} + \text{FR}} \)
|
227 |
+
(Proportion of correctly pronounced phonemes that were mistakenly flagged as errors.)
|
228 |
+
</li>
|
229 |
+
<li><strong>False Acceptance Rate (FAR):</strong>
|
230 |
+
\( \displaystyle \text{FAR} = \frac{\text{FA}}{\text{FA} + \text{TR}} \)
|
231 |
+
(Proportion of mispronounced phonemes that were mistakenly classified as correct.)
|
232 |
+
</li>
|
233 |
+
<li><strong>Diagnostic Error Rate (DER):</strong>
|
234 |
+
\( \displaystyle \text{DER} = \frac{\text{DE}}{\text{CD} + \text{DE}} \)
|
235 |
+
where DE is the number of misdiagnosed phonemes and CD is the number of correctly diagnosed ones.
|
236 |
+
</li>
|
237 |
+
</ul>
|
238 |
</p>
|
239 |
+
<p>
|
240 |
+
In addition to these hierarchical measures, we compute the standard <strong>Precision</strong>, <strong>Recall</strong>, and <strong>F-measure</strong> for mispronunciation detection:
|
241 |
+
<ul>
|
242 |
+
<li><strong>Precision:</strong>
|
243 |
+
\( \displaystyle \text{Precision} = \frac{\text{TR}}{\text{TR} + \text{FR}} \)
|
244 |
+
(Of all phonemes predicted as mispronounced, how many were actually mispronounced?)
|
245 |
+
</li>
|
246 |
+
<li><strong>Recall:</strong>
|
247 |
+
\( \displaystyle \text{Recall} = \frac{\text{TR}}{\text{TR} + \text{FA}} \;=\; 1 - \text{FAR} \)
|
248 |
+
(Of all truly mispronounced phonemes, how many did we correctly detect?)
|
249 |
+
</li>
|
250 |
+
<li><strong>F-measure (F1):</strong>
|
251 |
+
\( \displaystyle F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \)
|
252 |
+
(Harmonic mean of Precision and Recall.)
|
253 |
+
</li>
|
254 |
+
</ul>
|
255 |
+
</p>
|
256 |
+
<p>
|
257 |
+
<em>(Detailed evaluation weights and scripts will be made available on June 5, 2025.)</em>
|
258 |
+
</p>
|
259 |
+
|
260 |
|
261 |
<!-- Submission Details -->
|
262 |
<h2>Submission Details (Draft)</h2>
|