Update index.html
Browse files- index.html +12 -0
index.html
CHANGED
@@ -24,6 +24,18 @@
|
|
24 |
</div>
|
25 |
</section>
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
<section class="section">
|
28 |
<div class="container content">
|
29 |
<h2 class="title is-3">🧩 Main Pipeline Steps</h2>
|
|
|
24 |
</div>
|
25 |
</section>
|
26 |
|
27 |
+
<section class="section">
|
28 |
+
<div class="container content">
|
29 |
+
<p>
|
30 |
+
High-quality multilingual data is crucial for training effective large language models (LLMs).<br>
|
31 |
+
<strong>JQL (Judging Quality across Languages)</strong> is a scalable and lightweight data filtering approach that distills the judgment capabilities of strong multilingual LLMs into efficient cross-lingual annotators. These annotators enable robust filtering of web-scale data.
|
32 |
+
</p>
|
33 |
+
<p>
|
34 |
+
JQL improves data quality, retains more tokens, and generalizes beyond high-resource European languages—achieving strong performance on Arabic, Thai, and Mandarin. It outperforms heuristic baselines and enables efficient multilingual pretraining data curation at scale.
|
35 |
+
</p>
|
36 |
+
</div>
|
37 |
+
</section>
|
38 |
+
|
39 |
<section class="section">
|
40 |
<div class="container content">
|
41 |
<h2 class="title is-3">🧩 Main Pipeline Steps</h2>
|