Update index.html
Browse files- index.html +12 -12
index.html
CHANGED
@@ -56,18 +56,18 @@
|
|
56 |
|
57 |
<section class="section">
|
58 |
<div class="container content">
|
59 |
-
<h2 class="title is-3"
|
60 |
-
<
|
61 |
-
<
|
62 |
-
<
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
</li>
|
69 |
-
<li><strong
|
70 |
-
</
|
71 |
</div>
|
72 |
</section>
|
73 |
|
|
|
56 |
|
57 |
<section class="section">
|
58 |
<div class="container content">
|
59 |
+
<h2 class="title is-3">🧩 Main Pipeline Steps</h2>
|
60 |
+
<figure>
|
61 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/64bfc4d55ce3d382c05c0f9a/1zPQcwqt9Li_gCvd04_2_.png" alt="JQL Pipeline Overview">
|
62 |
+
<figcaption><em>Figure 1: Overview of the JQL pipeline</em></figcaption>
|
63 |
+
</figure>
|
64 |
+
|
65 |
+
<ol>
|
66 |
+
<li><strong>📋 Ground Truth Creation:</strong> Human annotators label monolingual documents based on a structured instruction prompt. These documents are translated into all target languages to create a multilingual gold-standard dataset. (See Figure 1)</li>
|
67 |
+
<li><strong>🤖 LLM-as-a-Judge Selection & Data Annotation:</strong> Strong multilingual LLMs (e.g., Gemma, Mistral, LLaMA) are evaluated against the ground truth, and top-performing models are used to produce synthetic annotations. (See Figure 1)</li>
|
68 |
+
<li><strong>🪶 Lightweight Annotator Training:</strong> Train compact regression heads on frozen multilingual embeddings to create efficient, high-throughput annotators. (See Figure 1)</li>
|
69 |
+
<li><strong>🚀 Scalable Data Filtering:</strong> Use trained annotators to filter large-scale pretraining corpora using quantile thresholds. (See Figure 1)</li>
|
70 |
+
</ol>
|
71 |
</div>
|
72 |
</section>
|
73 |
|