mfromm commited on
Commit
4c3ce7a
Β·
verified Β·
1 Parent(s): 205b70d

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +13 -12
index.html CHANGED
@@ -56,18 +56,19 @@
56
 
57
  <section class="section">
58
  <div class="container content">
59
- <h2 class="title is-3">🧩 Main Pipeline Steps</h2>
60
- <figure>
61
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64bfc4d55ce3d382c05c0f9a/1zPQcwqt9Li_gCvd04_2_.png" alt="JQL Pipeline Overview">
62
- <figcaption><em>Figure 1: Overview of the JQL pipeline</em></figcaption>
63
- </figure>
64
-
65
- <ol>
66
- <li><strong>πŸ“‹ Ground Truth Creation:</strong> Human annotators label monolingual documents based on a structured instruction prompt. These documents are translated into all target languages to create a multilingual gold-standard dataset. (See Figure 1)</li>
67
- <li><strong>πŸ€– LLM-as-a-Judge Selection & Data Annotation:</strong> Strong multilingual LLMs (e.g., Gemma, Mistral, LLaMA) are evaluated against the ground truth, and top-performing models are used to produce synthetic annotations. (See Figure 1)</li>
68
- <li><strong>πŸͺΆ Lightweight Annotator Training:</strong> Train compact regression heads on frozen multilingual embeddings to create efficient, high-throughput annotators. (See Figure 1)</li>
69
- <li><strong>πŸš€ Scalable Data Filtering:</strong> Use trained annotators to filter large-scale pretraining corpora using quantile thresholds. (See Figure 1)</li>
70
- </ol>
 
71
  </div>
72
  </section>
73
 
 
56
 
57
  <section class="section">
58
  <div class="container content">
59
+ <h2 class="title is-3">πŸ“ Available Artifacts</h2>
60
+ <ul>
61
+ <li><a href="https://huggingface.co/datasets/Jackal-AI/JQL-Human-Edu-Annotations" target="_blank">πŸ“„ Ground truth annotations in 35 languages</a></li>
62
+ <li><a href="https://huggingface.co/datasets/Jackal-AI/JQL-LLM-Edu-Annotations" target="_blank">🧠 Synthetic LLM-annotated dataset (14M+ documents)</a></li>
63
+ <li><a href="https://huggingface.co/Jackal-AI/JQL-Edu-Heads" target="_blank">πŸͺΆ Lightweight annotation models</a>:
64
+ <ul>
65
+ <li>JQL-Gemma</li>
66
+ <li>JQL-Mistral</li>
67
+ <li>JQL-Llama</li>
68
+ </ul>
69
+ </li>
70
+ <li>πŸ› οΈ Training & inference scripts: Coming soon</li>
71
+ </ul>
72
  </div>
73
  </section>
74