Legal-Embed-bge-base-en-v1.5
This repository hosts a fine-tuned version of BAAI/bge-base-en-v1.5
optimized for legal document (text) retrieval and Retrieval-Augmented Generation (RAG) tasks.
Model Details
- Base model: BAAI/bge-base-en-v1.5
- Dataset: axondendriteplus/legal-rag-embedding-dataset
- Task: Dense embedding learning for legal Q&A retrieval
- Framework: SentenceTransformers + HuggingFace Trainer
- Loss: MatryoshkaLossFunction (multi-resolution contrastive)
Evaluation (NDCG@10)
Dimension | Baseline | Fine-tuned | Improvement (%) |
---|---|---|---|
768 | 0.6105 | 0.6412 | 5.03 |
512 | 0.6037 | 0.6379 | 5.67 |
256 | 0.5853 | 0.6268 | 7.08 |
128 | 0.5276 | 0.5652 | 7.13 |
64 | 0.4469 | 0.5187 | 16.07 |
Metrics include cosine accuracy, MRR, MAP and NDCG.
Training Configuration
- Epochs: 4
- Batch size: 32
- Learning rate: 2e-5
- Data: 1,456 train / 162 test samples
- Hardware: CUDA GPU with FlashAttention
Findings
Maximum improvement: 16.07%
Fine-tuned 64D vs Baseline 768D: -15.03%
Fine-tuned 128D vs Baseline 768D: -7.41%
Storage reduction with 128D: 6× smaller
Storage reduction with 64D: 12× smaller
Baseline best score: 0.6105
Fine-tuned best score: 0.6412
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("axondendriteplus/Legal-Embed-bge-base-en-v1.5")
embeddings = model.encode(["your legal text"])
Credits
Fine-tuning guide: https://www.philschmid.de/fine-tune-embedding-model-for-rag
- Downloads last month
- 24
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for axondendriteplus/Legal-Embed-bge-base-en-v1.5
Base model
BAAI/bge-base-en-v1.5Evaluation results
- Cosine Accuracy@1 on dim 768self-reported0.438
- Cosine Accuracy@3 on dim 768self-reported0.673
- Cosine Accuracy@5 on dim 768self-reported0.778
- Cosine Accuracy@10 on dim 768self-reported0.858
- Cosine Precision@1 on dim 768self-reported0.438
- Cosine Precision@3 on dim 768self-reported0.224
- Cosine Precision@5 on dim 768self-reported0.156
- Cosine Precision@10 on dim 768self-reported0.086
- Cosine Recall@1 on dim 768self-reported0.438
- Cosine Recall@3 on dim 768self-reported0.673