mali90 commited on
Commit
8c12e10
·
verified ·
1 Parent(s): dba1c4d

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +12 -0
index.html CHANGED
@@ -24,6 +24,18 @@
24
  </div>
25
  </section>
26
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  <section class="section">
28
  <div class="container content">
29
  <h2 class="title is-3">🧩 Main Pipeline Steps</h2>
 
24
  </div>
25
  </section>
26
 
27
+ <section class="section">
28
+ <div class="container content">
29
+ <p>
30
+ High-quality multilingual data is crucial for training effective large language models (LLMs).<br>
31
+ <strong>JQL (Judging Quality across Languages)</strong> is a scalable and lightweight data filtering approach that distills the judgment capabilities of strong multilingual LLMs into efficient cross-lingual annotators. These annotators enable robust filtering of web-scale data.
32
+ </p>
33
+ <p>
34
+ JQL improves data quality, retains more tokens, and generalizes beyond high-resource European languages—achieving strong performance on Arabic, Thai, and Mandarin. It outperforms heuristic baselines and enables efficient multilingual pretraining data curation at scale.
35
+ </p>
36
+ </div>
37
+ </section>
38
+
39
  <section class="section">
40
  <div class="container content">
41
  <h2 class="title is-3">🧩 Main Pipeline Steps</h2>