File size: 3,364 Bytes
2091a03
 
 
 
 
 
 
92e1006
 
 
 
23c2adf
92e1006
 
 
 
 
3930e7d
92e1006
 
 
23c2adf
92e1006
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23c2adf
 
92e1006
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23c2adf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: apache-2.0
language:
- en
base_model:
- microsoft/codebert-base
---
# Model Card for Model ID

# **RocqStar Ranker Embedder**

A self‑attentive embedding model for premise / proof selection in Rocq ITP. RocqStar is fine‑tuned from CodeBERT so that distances in the embedding space correlate with proof similarity rather than with surface‑level statement overlap. It replaces BM25/Jaccard similarity in retrieval‑augmented generation (RAG) pipelines such as **CoqPilot**, leading to higher proof success rates on the IMM‑300 benchmark.

## Model Details

### Model Description

RocqStar is a 125 M‑parameter Transformer encoder (768‑dim hidden size) with multi‑head self‑attention and a learned self‑attentive pooling head. It is trained with an InfoNCE contrastive objective so that the cosine similarity of two statement embeddings approximates the similarity of their proofs, measured by a hybrid Levenshtein + Jaccard distance. The model takes tokenised Rocq (Gallina) theorem statements as input and outputs a 768‑d embedding.

* **Model type:** Transformer encoder with self‑attentive pooling
* **Language(s):** Rocq / Coq (Gallina) syntax (tokens)
* **License:** Apache‑2.0
* **Fine‑tuned from:** `microsoft/codebert-base`

### Model Sources

* **Repository:** [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBrains-Research/big-rocq)
* **Paper:** (coming soon)

## How to Get Started

Go to [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBrains-Research/big-rocq). In subdirectory `ranker-server` you can find the server that you need to run to be able to use the model from the CoqPilot plugin. Also it may be used as a reference example of how to do inference with the model. 

## Training Details

### Training Data

* **Corpus:** 76 524 mined statements + proofs from 4 large Rocq projects.
* **Positive/negative mining:** Proof distance < τ₊ (0.35) → positive; > τ₋ (0.6) → negative; 4 hard negatives per anchor.

### Training Procedure

* **Objective:** InfoNCE 
* **Batch size:** 32
* **Optimizer / LR:** AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
* **Hardware:** 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock

## Evaluation

### Testing Data, Factors & Metrics

* **Dataset:** IMM‑300 (300 Rocq theorems) from the IMM project
* **Metrics:** Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.

### Results

| Model (back‑end) | Bucket | Baseline Jaccard | **RocqStar** |
| ---------------- | ------ | ---------------- | ------------ |
| GPT‑4o           | ≤ 4    | 48 ± 5 %         | **51 ± 5 %** |
| GPT‑4o           | 5–8    | 18 ± 4 %         | **25 ± 3 %** |
| GPT‑4o           | 9–20   | 11 ± 4 %         | 11 ± 5 %     |
| Claude 3.5       | ≤ 4    | 58 ± 5 %         | **61 ± 4 %** |
| Claude 3.5       | 5–8    | 28 ± 5 %         | **36 ± 5 %** |
| Claude 3.5       | 9–20   | 16 ± 5 %         | **21 ± 5 %** |

#### Summary

RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.