|
--- |
|
datasets: |
|
- cis-lmu/glotlid-corpus |
|
pipeline_tag: text-classification |
|
metrics: |
|
- f1 |
|
--- |
|
|
|
## Description |
|
**ConLID**: Language Identification model that supports more than 2000 languages (three-letter ISO codes with script). For the list of all supported languages please refer to [labels.json](https://huggingface.co/Jakh0103/lid/blob/main/labels.json). |
|
|
|
Repository: [GitHub](https://github.com/epfl-nlp/language-identification) |
|
|
|
## Usage |
|
**Setup** |
|
```bash |
|
git clone https://github.com/epfl-nlp/ConLID.git |
|
pip install -r requirements.txt |
|
``` |
|
|
|
**Download the model** |
|
```python |
|
from huggingface_hub import snapshot_download |
|
|
|
snapshot_download(repo_id="Jakh0103/lid", local_dir="checkpoint") |
|
``` |
|
|
|
**Use the model** |
|
```python |
|
from model import LID |
|
model = LID.from_pretrained(dir='checkpoint') |
|
|
|
# print the supported labels |
|
print(model.get_labels()) |
|
## ['aai_Latn', 'aak_Latn', 'aau_Latn', 'aaz_Latn', 'aba_Latn', ...] |
|
|
|
# prediction |
|
model.predict("The cat climbed onto the roof to enjoy the warm sunlight peacefully!") |
|
# (['eng_Latn'], [0.970989465713501]) |
|
|
|
model.predict("The cat climbed onto the roof to enjoy the warm sunlight peacefully!", k=3) |
|
## (['eng_Latn', 'sco_Latn', 'jam_Latn'], [0.970989465713501, 0.006496887654066086, 0.00487488554790616]) |
|
``` |