metadata
title: Spanish Embeddings Api
emoji: 🐨
colorFrom: green
colorTo: green
sdk: docker
pinned: false
Multilingual & Legal Embeddings API
A high-performance FastAPI application providing access to 5 specialized embedding models for Spanish, Catalan, English, and multilingual text. Each model has its own dedicated endpoint for optimal performance and clarity.
🌐 Live API: https://aurasystems-spanish-embeddings-api.hf.space
📖 Interactive Docs: https://aurasystems-spanish-embeddings-api.hf.space/docs
🚀 Quick Start
Basic Usage
# Test jina-v3 endpoint (multilingual, loads at startup)
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/jina-v3" \
-H "Content-Type: application/json" \
-d '{"texts": ["Hello world", "Hola mundo"], "normalize": true}'
# Test Catalan RoBERTa endpoint
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/roberta-ca" \
-H "Content-Type: application/json" \
-d '{"texts": ["Bon dia", "Com estàs?"], "normalize": true}'
📚 Available Models & Endpoints
Endpoint | Model | Languages | Dimensions | Max Tokens | Loading Strategy |
---|---|---|---|---|---|
/embed/jina-v3 |
jinaai/jina-embeddings-v3 | Multilingual (30+) | 1024 | 8192 | Startup |
/embed/roberta-ca |
projecte-aina/roberta-large-ca-v2 | Catalan | 1024 | 512 | On-demand |
/embed/jina |
jinaai/jina-embeddings-v2-base-es | Spanish, English | 768 | 8192 | On-demand |
/embed/robertalex |
PlanTL-GOB-ES/RoBERTalex | Spanish Legal | 768 | 512 | On-demand |
/embed/legal-bert |
nlpaueb/legal-bert-base-uncased | English Legal | 768 | 512 | On-demand |
Model Recommendations
- 🌍 General multilingual: Use
/embed/jina-v3
- Best overall performance - 🇪🇸 Spanish general: Use
/embed/jina
- Excellent for Spanish/English - 🇪🇸 Spanish legal: Use
/embed/robertalex
- Specialized for legal texts - 🏴 Catalan: Use
/embed/roberta-ca
- Best for Catalan text - 🇬🇧 English legal: Use
/embed/legal-bert
- Specialized for legal documents
🔗 API Endpoints
Model-Specific Embedding Endpoints
Each model has its dedicated endpoint:
POST /embed/jina-v3 # Multilingual (startup model)
POST /embed/roberta-ca # Catalan
POST /embed/jina # Spanish/English
POST /embed/robertalex # Spanish Legal
POST /embed/legal-bert # English Legal
Utility Endpoints
GET / # API information
GET /health # Health check and model status
GET /models # List all models with specifications
📖 Usage Examples
Python
import requests
API_URL = "https://aurasystems-spanish-embeddings-api.hf.space"
# Example 1: Multilingual with Jina v3 (startup model - fastest)
response = requests.post(
f"{API_URL}/embed/jina-v3",
json={
"texts": [
"Hello world", # English
"Hola mundo", # Spanish
"Bonjour monde", # French
"こんにちは世界" # Japanese
],
"normalize": True
}
)
result = response.json()
print(f"Jina v3: {result['dimensions']} dimensions") # 1024
# Example 2: Catalan text with RoBERTa-ca
response = requests.post(
f"{API_URL}/embed/roberta-ca",
json={
"texts": [
"Bon dia, com estàs?",
"Barcelona és una ciutat meravellosa",
"M'agrada la cultura catalana"
],
"normalize": True
}
)
catalan_result = response.json()
print(f"Catalan: {catalan_result['dimensions']} dimensions") # 1024
# Example 3: Spanish legal text with RoBERTalex
response = requests.post(
f"{API_URL}/embed/robertalex",
json={
"texts": [
"Artículo primero de la constitución",
"El contrato será válido desde la fecha de firma",
"La jurisprudencia establece que..."
],
"normalize": True
}
)
legal_result = response.json()
print(f"Spanish Legal: {legal_result['dimensions']} dimensions") # 768
# Example 4: English legal text with Legal-BERT
response = requests.post(
f"{API_URL}/embed/legal-bert",
json={
"texts": [
"This agreement is legally binding",
"The contract shall be governed by English law",
"The party hereby agrees and covenants"
],
"normalize": True
}
)
english_legal_result = response.json()
print(f"English Legal: {english_legal_result['dimensions']} dimensions") # 768
# Example 5: Spanish/English bilingual with Jina v2
response = requests.post(
f"{API_URL}/embed/jina",
json={
"texts": [
"Inteligencia artificial y machine learning",
"Artificial intelligence and machine learning",
"Procesamiento de lenguaje natural"
],
"normalize": True
}
)
bilingual_result = response.json()
print(f"Bilingual: {bilingual_result['dimensions']} dimensions") # 768
JavaScript/Node.js
const API_URL = 'https://aurasystems-spanish-embeddings-api.hf.space';
// Function to get embeddings from specific endpoint
async function getEmbeddings(endpoint, texts) {
const response = await fetch(`${API_URL}/embed/${endpoint}`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
texts: texts,
normalize: true
})
});
if (!response.ok) {
throw new Error(`Error: ${response.status}`);
}
return await response.json();
}
// Usage examples
try {
// Multilingual embeddings
const multilingualResult = await getEmbeddings('jina-v3', [
'Hello world',
'Hola mundo',
'Ciao mondo'
]);
console.log('Multilingual dimensions:', multilingualResult.dimensions);
// Catalan embeddings
const catalanResult = await getEmbeddings('roberta-ca', [
'Bon dia',
'Com estàs?'
]);
console.log('Catalan dimensions:', catalanResult.dimensions);
} catch (error) {
console.error('Error:', error);
}
cURL Examples
# Multilingual with Jina v3 (startup model)
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/jina-v3" \
-H "Content-Type: application/json" \
-d '{
"texts": ["Hello", "Hola", "Bonjour"],
"normalize": true
}'
# Catalan with RoBERTa-ca
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/roberta-ca" \
-H "Content-Type: application/json" \
-d '{
"texts": ["Bon dia", "Com estàs?"],
"normalize": true
}'
# Spanish legal with RoBERTalex
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/robertalex" \
-H "Content-Type: application/json" \
-d '{
"texts": ["Artículo primero"],
"normalize": true
}'
# English legal with Legal-BERT
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/legal-bert" \
-H "Content-Type: application/json" \
-d '{
"texts": ["This agreement is binding"],
"normalize": true
}'
# Spanish/English bilingual with Jina v2
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/jina" \
-H "Content-Type: application/json" \
-d '{
"texts": ["Texto en español", "Text in English"],
"normalize": true
}'
📋 Request/Response Schema
Request Body
{
"texts": ["text1", "text2", "..."],
"normalize": true,
"max_length": null
}
Field | Type | Required | Default | Description |
---|---|---|---|---|
texts |
array[string] | ✅ Yes | - | 1-50 texts to embed |
normalize |
boolean | No | true |
L2-normalize embeddings |
max_length |
integer/null | No | null |
Max tokens (model-specific limits) |
Response Body
{
"embeddings": [[0.123, -0.456, ...], [0.789, -0.012, ...]],
"model_used": "jina-v3",
"dimensions": 1024,
"num_texts": 2
}
⚡ Performance & Limits
- Maximum texts per request: 50
- Startup model:
jina-v3
loads at startup (fastest response) - On-demand models: Load on first request (~30-60s first time)
- Typical response time: 100-300ms after models are loaded
- Memory optimization: Automatic cleanup for large batches
- CORS enabled: Works from any domain
🔧 Advanced Usage
LangChain Integration
from langchain.embeddings.base import Embeddings
from typing import List
import requests
class MultilingualEmbeddings(Embeddings):
"""LangChain integration for multilingual embeddings"""
def __init__(self, endpoint: str = "jina-v3"):
"""
Initialize with specific endpoint
Args:
endpoint: One of "jina-v3", "roberta-ca", "jina", "robertalex", "legal-bert"
"""
self.api_url = f"https://aurasystems-spanish-embeddings-api.hf.space/embed/{endpoint}"
self.endpoint = endpoint
def embed_documents(self, texts: List[str]) -> List[List[float]]:
response = requests.post(
self.api_url,
json={"texts": texts, "normalize": True}
)
response.raise_for_status()
return response.json()["embeddings"]
def embed_query(self, text: str) -> List[float]:
return self.embed_documents([text])[0]
# Usage examples
multilingual_embeddings = MultilingualEmbeddings("jina-v3")
catalan_embeddings = MultilingualEmbeddings("roberta-ca")
spanish_legal_embeddings = MultilingualEmbeddings("robertalex")
Semantic Search
import numpy as np
from typing import List, Tuple
def semantic_search(query: str, documents: List[str], endpoint: str = "jina-v3", top_k: int = 5):
"""Semantic search using specific model endpoint"""
response = requests.post(
f"https://aurasystems-spanish-embeddings-api.hf.space/embed/{endpoint}",
json={"texts": [query] + documents, "normalize": True}
)
embeddings = np.array(response.json()["embeddings"])
query_embedding = embeddings[0]
doc_embeddings = embeddings[1:]
# Calculate cosine similarities (already normalized)
similarities = np.dot(doc_embeddings, query_embedding)
top_indices = np.argsort(similarities)[::-1][:top_k]
return [(idx, similarities[idx]) for idx in top_indices]
# Example: Multilingual search
documents = [
"Python programming language",
"Lenguaje de programación Python",
"Llenguatge de programació Python",
"Language de programmation Python"
]
results = semantic_search("código en Python", documents, "jina-v3")
for idx, score in results:
print(f"{score:.4f}: {documents[idx]}")
🚨 Error Handling
HTTP Status Codes
Code | Description |
---|---|
200 | Success |
400 | Bad Request (validation error) |
422 | Unprocessable Entity (schema error) |
500 | Internal Server Error (model loading failed) |
Common Errors
# Handle errors properly
try:
response = requests.post(
"https://aurasystems-spanish-embeddings-api.hf.space/embed/jina-v3",
json={"texts": ["text"], "normalize": True}
)
response.raise_for_status()
result = response.json()
except requests.exceptions.HTTPError as e:
print(f"HTTP error: {e}")
print(f"Response: {response.text}")
except requests.exceptions.RequestException as e:
print(f"Request error: {e}")
📊 Model Status Check
# Check which models are loaded
health = requests.get("https://aurasystems-spanish-embeddings-api.hf.space/health")
status = health.json()
print(f"API Status: {status['status']}")
print(f"Startup model loaded: {status['startup_model_loaded']}")
print(f"Available models: {status['available_models']}")
print(f"Models loaded: {status['models_count']}/5")
# Check endpoint status
for model, endpoint_status in status['endpoints'].items():
print(f"{model}: {endpoint_status}")
🔒 Authentication & Rate Limits
- Authentication: None required (open API)
- Rate limits: Generous limits on Hugging Face Spaces
- CORS: Enabled for all origins
- Usage: Free for research and commercial use
🏗️ Architecture
Endpoint-Per-Model Design
- Startup model:
jina-v3
loads at application startup for fastest response - On-demand loading: Other models load when first requested
- Memory optimization: Progressive loading reduces startup time
- Model caching: Once loaded, models remain in memory for fast inference
Technical Stack
- FastAPI: Modern async web framework
- Transformers: Hugging Face model library
- PyTorch: Deep learning backend
- Docker: Containerized deployment
- Hugging Face Spaces: Cloud hosting platform
📄 Model Licenses
- Jina models: Apache 2.0
- RoBERTa models: MIT/Apache 2.0
- Legal-BERT: Apache 2.0
🤝 Support & Contributing
- Issues: GitHub Issues
- Interactive Docs: FastAPI Swagger UI
- Model Papers: Check individual model pages on Hugging Face
Built with ❤️ using FastAPI and Hugging Face Transformers