mali90 commited on
Commit
edb7a23
·
verified ·
1 Parent(s): 8c12e10

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +4 -3
index.html CHANGED
@@ -27,11 +27,12 @@
27
  <section class="section">
28
  <div class="container content">
29
  <p>
30
- High-quality multilingual data is crucial for training effective large language models (LLMs).<br>
31
- <strong>JQL (Judging Quality across Languages)</strong> is a scalable and lightweight data filtering approach that distills the judgment capabilities of strong multilingual LLMs into efficient cross-lingual annotators. These annotators enable robust filtering of web-scale data.
 
32
  </p>
33
  <p>
34
- JQL improves data quality, retains more tokens, and generalizes beyond high-resource European languages—achieving strong performance on Arabic, Thai, and Mandarin. It outperforms heuristic baselines and enables efficient multilingual pretraining data curation at scale.
35
  </p>
36
  </div>
37
  </section>
 
27
  <section class="section">
28
  <div class="container content">
29
  <p>
30
+ High-quality multilingual data is crucial for training effective large language models (LLMs).
31
+ <strong>JQL (Judging Quality across Languages)</strong> is a scalable and lightweight multilingual data filtering approach that distills the judgment capabilities of strong
32
+ multilingual LLMs into efficient cross-lingual annotators.
33
  </p>
34
  <p>
35
+ Overall, JQL improves data quality, retains more tokens, and generalizes to unseen languages. It outperforms heuristic baselines and enables cost-efficient multilingual pretraining data curation at scale.
36
  </p>
37
  </div>
38
  </section>