File size: 3,364 Bytes
2091a03 92e1006 23c2adf 92e1006 3930e7d 92e1006 23c2adf 92e1006 23c2adf 92e1006 23c2adf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
license: apache-2.0
language:
- en
base_model:
- microsoft/codebert-base
---
# Model Card for Model ID
# **RocqStar Ranker Embedder**
A self‑attentive embedding model for premise / proof selection in Rocq ITP. RocqStar is fine‑tuned from CodeBERT so that distances in the embedding space correlate with proof similarity rather than with surface‑level statement overlap. It replaces BM25/Jaccard similarity in retrieval‑augmented generation (RAG) pipelines such as **CoqPilot**, leading to higher proof success rates on the IMM‑300 benchmark.
## Model Details
### Model Description
RocqStar is a 125 M‑parameter Transformer encoder (768‑dim hidden size) with multi‑head self‑attention and a learned self‑attentive pooling head. It is trained with an InfoNCE contrastive objective so that the cosine similarity of two statement embeddings approximates the similarity of their proofs, measured by a hybrid Levenshtein + Jaccard distance. The model takes tokenised Rocq (Gallina) theorem statements as input and outputs a 768‑d embedding.
* **Model type:** Transformer encoder with self‑attentive pooling
* **Language(s):** Rocq / Coq (Gallina) syntax (tokens)
* **License:** Apache‑2.0
* **Fine‑tuned from:** `microsoft/codebert-base`
### Model Sources
* **Repository:** [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBrains-Research/big-rocq)
* **Paper:** (coming soon)
## How to Get Started
Go to [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBrains-Research/big-rocq). In subdirectory `ranker-server` you can find the server that you need to run to be able to use the model from the CoqPilot plugin. Also it may be used as a reference example of how to do inference with the model.
## Training Details
### Training Data
* **Corpus:** 76 524 mined statements + proofs from 4 large Rocq projects.
* **Positive/negative mining:** Proof distance < τ₊ (0.35) → positive; > τ₋ (0.6) → negative; 4 hard negatives per anchor.
### Training Procedure
* **Objective:** InfoNCE
* **Batch size:** 32
* **Optimizer / LR:** AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
* **Hardware:** 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock
## Evaluation
### Testing Data, Factors & Metrics
* **Dataset:** IMM‑300 (300 Rocq theorems) from the IMM project
* **Metrics:** Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.
### Results
| Model (back‑end) | Bucket | Baseline Jaccard | **RocqStar** |
| ---------------- | ------ | ---------------- | ------------ |
| GPT‑4o | ≤ 4 | 48 ± 5 % | **51 ± 5 %** |
| GPT‑4o | 5–8 | 18 ± 4 % | **25 ± 3 %** |
| GPT‑4o | 9–20 | 11 ± 4 % | 11 ± 5 % |
| Claude 3.5 | ≤ 4 | 58 ± 5 % | **61 ± 4 %** |
| Claude 3.5 | 5–8 | 28 ± 5 % | **36 ± 5 %** |
| Claude 3.5 | 9–20 | 16 ± 5 % | **21 ± 5 %** |
#### Summary
RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity. |