--- license: apache-2.0 language: - en base_model: - microsoft/codebert-base --- # Model Card for Model ID # **RocqStar Ranker Embedder** A self‑attentive embedding model for premise / proof selection in Rocq ITP. RocqStar is fine‑tuned from CodeBERT so that distances in the embedding space correlate with proof similarity rather than with surface‑level statement overlap. It replaces BM25/Jaccard similarity in retrieval‑augmented generation (RAG) pipelines such as **CoqPilot**, leading to higher proof success rates on the IMM‑300 benchmark. ## Model Details ### Model Description RocqStar is a 125 M‑parameter Transformer encoder (768‑dim hidden size) with multi‑head self‑attention and a learned self‑attentive pooling head. It is trained with an InfoNCE contrastive objective so that the cosine similarity of two statement embeddings approximates the similarity of their proofs, measured by a hybrid Levenshtein + Jaccard distance. The model takes tokenised Rocq (Gallina) theorem statements as input and outputs a 768‑d embedding. * **Model type:** Transformer encoder with self‑attentive pooling * **Language(s):** Rocq / Coq (Gallina) syntax (tokens) * **License:** Apache‑2.0 * **Fine‑tuned from:** `microsoft/codebert-base` ### Model Sources * **Repository:** [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBrains-Research/big-rocq) * **Paper:** (coming soon) ## How to Get Started Go to [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBrains-Research/big-rocq). In subdirectory `ranker-server` you can find the server that you need to run to be able to use the model from the CoqPilot plugin. Also it may be used as a reference example of how to do inference with the model. ## Training Details ### Training Data * **Corpus:** 76 524 mined statements + proofs from 4 large Rocq projects. * **Positive/negative mining:** Proof distance < τ₊ (0.35) → positive; > τ₋ (0.6) → negative; 4 hard negatives per anchor. ### Training Procedure * **Objective:** InfoNCE * **Batch size:** 32 * **Optimizer / LR:** AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps * **Hardware:** 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock ## Evaluation ### Testing Data, Factors & Metrics * **Dataset:** IMM‑300 (300 Rocq theorems) from the IMM project * **Metrics:** Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations. ### Results | Model (back‑end) | Bucket | Baseline Jaccard | **RocqStar** | | ---------------- | ------ | ---------------- | ------------ | | GPT‑4o | ≤ 4 | 48 ± 5 % | **51 ± 5 %** | | GPT‑4o | 5–8 | 18 ± 4 % | **25 ± 3 %** | | GPT‑4o | 9–20 | 11 ± 4 % | 11 ± 5 % | | Claude 3.5 | ≤ 4 | 58 ± 5 % | **61 ± 4 %** | | Claude 3.5 | 5–8 | 28 ± 5 % | **36 ± 5 %** | | Claude 3.5 | 9–20 | 16 ± 5 % | **21 ± 5 %** | #### Summary RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.