Description

ConLID: Language Identification model that supports more than 2000 languages (three-letter ISO codes with script). For the list of all supported languages please refer to labels.json.

Repository: GitHub

Usage

Setup

git clone https://github.com/epfl-nlp/ConLID.git
pip install -r requirements.txt

Download the model

from huggingface_hub import snapshot_download

snapshot_download(repo_id="Jakh0103/lid", local_dir="checkpoint")

Use the model

from model import LID
model = LID.from_pretrained(dir='checkpoint')

# print the supported labels
print(model.get_labels())
## ['aai_Latn', 'aak_Latn', 'aau_Latn', 'aaz_Latn', 'aba_Latn', ...]

# prediction
model.predict("The cat climbed onto the roof to enjoy the warm sunlight peacefully!")
# (['eng_Latn'], [0.970989465713501])

model.predict("The cat climbed onto the roof to enjoy the warm sunlight peacefully!", k=3)
## (['eng_Latn', 'sco_Latn', 'jam_Latn'], [0.970989465713501, 0.006496887654066086, 0.00487488554790616])
Downloads last month
7
Safetensors
Model size
287M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train epfl-nlp/ConLID