Legal-Embed-bge-base-en-v1.5

This repository hosts a fine-tuned version of BAAI/bge-base-en-v1.5 optimized for legal document (text) retrieval and Retrieval-Augmented Generation (RAG) tasks.

Model Details

Base model: BAAI/bge-base-en-v1.5
Dataset: axondendriteplus/legal-rag-embedding-dataset
Task: Dense embedding learning for legal Q&A retrieval
Framework: SentenceTransformers + HuggingFace Trainer
Loss: MatryoshkaLossFunction (multi-resolution contrastive)

Evaluation (NDCG@10)

Dimension	Baseline	Fine-tuned	Improvement (%)
768	0.6105	0.6412	5.03
512	0.6037	0.6379	5.67
256	0.5853	0.6268	7.08
128	0.5276	0.5652	7.13
64	0.4469	0.5187	16.07

Metrics include cosine accuracy, MRR, MAP and NDCG.

Training Configuration

Epochs: 4
Batch size: 32
Learning rate: 2e-5
Data: 1,456 train / 162 test samples
Hardware: CUDA GPU with FlashAttention

Findings

Maximum improvement: 16.07%
Fine-tuned 64D vs Baseline 768D: -15.03%
Fine-tuned 128D vs Baseline 768D: -7.41%
Storage reduction with 128D: 6× smaller
Storage reduction with 64D: 12× smaller
Baseline best score: 0.6105
Fine-tuned best score: 0.6412

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("axondendriteplus/Legal-Embed-bge-base-en-v1.5")
embeddings = model.encode(["your legal text"])

Credits

Fine-tuning guide: https://www.philschmid.de/fine-tune-embedding-model-for-rag

Model tree for axondendriteplus/Legal-Embed-bge-base-en-v1.5

Base model

BAAI/bge-base-en-v1.5

Finetuned

(406)

this model

Evaluation results

Cosine Accuracy@1 on dim 768
self-reported

0.438
Cosine Accuracy@3 on dim 768
self-reported

0.673
Cosine Accuracy@5 on dim 768
self-reported

0.778
Cosine Accuracy@10 on dim 768
self-reported

0.858
Cosine Precision@1 on dim 768
self-reported

0.438
Cosine Precision@3 on dim 768
self-reported

0.224
Cosine Precision@5 on dim 768
self-reported

0.156
Cosine Precision@10 on dim 768
self-reported

0.086
Cosine Recall@1 on dim 768
self-reported

0.438
Cosine Recall@3 on dim 768
self-reported

0.673

View on Papers With Code