Spaces:

IqraEval
/

SharedTask_ArabicNLP2025

Running

App Files Files Community

SharedTask_ArabicNLP2025 / index.html

01Yassine

Update index.html

cb75d3d verified 3 months ago

raw

history blame

7.79 kB

	<!doctype html>
	<html lang="en">
	<head>
	<meta charset="utf-8" />
	<meta name="viewport" content="width=device-width" />
	<title>Iqra’Eval Shared Task</title>
	<style>
	/* Color Palette */
	:root {
	--navy-blue: #001f4d;
	--coral: #ff6f61;
	--light-gray: #f5f7fa;
	--text-dark: #222;
	}

	body {
	font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
	background-color: var(--light-gray);
	color: var(--text-dark);
	margin: 20px;
	line-height: 1.6;
	}

	h1, h2, h3 {
	color: var(--navy-blue);
	font-weight: 700;
	margin-top: 1.2em;
	}

	h1 {
	text-align: center;
	font-size: 2.8rem;
	margin-bottom: 0.3em;
	}

	h2 {
	border-bottom: 3px solid var(--coral);
	padding-bottom: 0.3em;
	}

	h3 {
	color: var(--coral);
	margin-top: 1em;
	}

	p {
	max-width: 900px;
	margin: 0.8em auto;
	}

	strong {
	color: var(--navy-blue);
	}

	ul {
	max-width: 900px;
	margin: 0.5em auto 1.5em auto;
	padding-left: 1.2em;
	}

	ul li {
	margin: 0.4em 0;
	}

	code {
	background-color: #eef4f8;
	color: var(--navy-blue);
	padding: 2px 6px;
	border-radius: 4px;
	font-family: Consolas, monospace;
	font-size: 0.9em;
	}

	pre {
	max-width: 900px;
	background-color: #eef4f8;
	color: var(--navy-blue);
	padding: 1em;
	border-radius: 8px;
	overflow-x: auto;
	font-family: Consolas, monospace;
	font-size: 0.95em;
	margin: 0.8em auto;
	}

	a {
	color: var(--coral);
	text-decoration: none;
	}

	a:hover {
	text-decoration: underline;
	}

	.card {
	max-width: 1200px;
	background: white;
	margin: 0 auto 40px auto;
	padding: 2em 2.5em;
	box-shadow: 0 4px 14px rgba(0,0,0,0.1);
	border-radius: 12px;
	}

	/* Centering images and captions */
	div img {
	display: block;
	margin: 20px auto;
	max-width: 100%;
	height: auto;
	border-radius: 8px;
	box-shadow: 0 4px 8px rgba(0,31,77,0.15);
	}

	.centered p {
	text-align: center;
	font-style: italic;
	color: var(--navy-blue);
	margin-top: 0.4em;
	}

	.highlight {
	color: var(--coral);
	font-weight: 700;
	}

	/* Lists inside paragraphs */
	p > ul {
	margin-top: 0.3em;
	}

	</style>
	</head>
	<body>
	<div class="card">
	<h1>Iqra’Eval Shared Task</h1>

	<div>
	<img src="IqraEval.png" alt="IqraEval Logo" />
	</div>

	<!-- Overview Section -->
	<h2>Overview</h2>
	<p>
	<strong>Iqra'Eval</strong> is a shared task aimed at advancing <strong>automatic assessment of Qur’anic recitation pronunciation</strong> by leveraging computational methods to detect and diagnose pronunciation errors. The focus on Qur’anic recitation provides a standardized and well-defined context for evaluating Modern Standard Arabic (MSA) pronunciation.
	</p>
	<p>
	Participants will develop systems capable of detecting mispronunciations (e.g., substitution, deletion, or insertion of phonemes).
	</p>

	<!-- Timeline Section -->
	<h2>Timeline</h2>
	<ul>
	<li><strong>June 1, 2025</strong>: Official announcement of the shared task</li>
	<li><strong>June 10, 2025</strong>: Release of training data, development set (QuranMB), phonetizer script, and baseline systems</li>
	<li><strong>July 24, 2025</strong>: Registration deadline and release of test data</li>
	<li><strong>July 27, 2025</strong>: End of evaluation cycle (test set submission closes)</li>
	<li><strong>July 30, 2025</strong>: Final results released</li>
	<li><strong>August 15, 2025</strong>: System description paper submissions due</li>
	<li><strong>August 22, 2025</strong>: Notification of acceptance</li>
	<li><strong>September 5, 2025</strong>: Camera-ready versions due</li>
	</ul>

	<!-- Task Description -->
	<h2>Task Description: Quranic Mispronunciation Detection System</h2>
	<p>
	The aim is to design a model to detect and provide detailed feedback on mispronunciations in Quranic recitations.
	Users read aloud vowelized Quranic verses; this model predicts the phoneme sequence uttered by the speaker, which may contain mispronunciations.
	Models are evaluated on the <strong>QuranMB.v2</strong> dataset, which contains human‐annotated mispronunciations.
	</p>

	<div class="centered">
	<img src="task.png" alt="System Overview" />
	<p>Figure: Overview of the Mispronunciation Detection Workflow</p>
	</div>

	<h3>1. Read the Verse</h3>
	<p>
	The user is shown a <strong>Reference Verse</strong> (What should have been said) in Arabic script along with its corresponding <strong>Reference Phoneme Sequence</strong>.
	</p>
	<p><strong>Example:</strong></p>
	<ul>
	<li><strong>Arabic:</strong> إِنَّ الصَّفَا وَالْمَرْوَةَ مِنْ شَعَائِرِ اللَّهِ</li>
	<li>
	<strong>Phoneme:</strong>
	<code>< i n n a SS A f aa w a l m a r w a t a m i n $ a E a a < i r i l l a h i</code>
	</li>
	</ul>

	<h3>2. Save Recording</h3>
	<p>
	The user recites the verse aloud; the system captures and stores the audio waveform for subsequent analysis.
	</p>

	<h3>3. Mispronunciation Detection</h3>
	<p>
	The stored audio is fed into a <strong>Mispronunciation Detection Model</strong>.
	This model predicts the phoneme sequence uttered by the speaker, which may contain mispronunciations.
	</p>
	<p><strong>Example of Mispronunciation:</strong></p>
	<ul>
	<li><strong>Reference Phoneme Sequence (What should have been said):</strong> <code>< i n n a SS A f aa w a l m a r w a t a m i n $ a E a a < i r i l l a h i</code></li>
	<li><strong>Model Phoneme Prediction (What is predicted):</strong> <code>< i n n a SS A f aa w a l m a r w a t a m i n s a E a a < i r u l l a h i</code></li>
	<li>
	<strong>Annotated Phoneme Sequence (What is said):</strong>
	<code>< i n n a SS A f aa w a l m a r w <span class="highlight">s</span> a E a a < i <span class="highlight">r u</span> l l a h i</code>
	</li>
	</ul>
	<p>
	In this case, the phoneme <code>$</code> was mispronounced as <code>s</code>, and <code>i</code> was mispronounced as <code>u</code>.
	</p>
	<p>
	The annotated phoneme sequence indicates that the phoneme <code>ta</code> was omitted, but the model failed to detect it.
	</p>

	<h2>Training Dataset: Description</h2>
	<p>
	All data are hosted on Hugging Face. Two main splits are provided:
	</p>
	<ul>
	<li>
	<strong>Training set:</strong> 79 hours of Modern Standard Arabic (MSA) Quran recitations (5,167 audio files)
	</li>
	<li>
	<strong>Evaluation set:</strong> QuranMB.v2 dataset with phoneme-level mispronunciation annotations, which includes:
	<ul>
	<li>QuranMB-Train: 9 hours (1,218 files) for development</li>
	<li>QuranMB-Test: 8 hours (1,018 files) for evaluation</li>
	</ul>
	</li>
	</ul>

	<h2>Submission Guidelines</h2>
	<p>
	Participants should submit their predicted phoneme sequences on the test set by the deadline (July 27, 2025). Submissions will be automatically evaluated using the official scoring scripts.
	</p>

	<h2>Evaluation Metrics</h2>
	<p>
	Systems will be evaluated based on phoneme error rates (PER) computed over the test set, measuring accuracy in detecting and localizing mispronunciations.
	</p>

	<h2>Contact and Support</h2>
	<p>
	For inquiries and support, reach out to the task coordinators at
	<a href="mailto:support@iqraeval.org">support@iqraeval.org</a>.
	</p>

	</div>
	</body>
	</html>