MMS-1B-All Fine-tuned on Darija Bible Dataset
This model is a fine-tuned version of facebook/mms-1b-all on the atlasia/darija_bible_aligned dataset for Moroccan Arabic (Darija) speech recognition.
Model Description
- Model type: Speech Recognition (CTC)
- Language: Moroccan Arabic (Darija)
- Base model: facebook/mms-1b-all
- Dataset: Darija Bible Aligned Dataset
- License: Apache 2.0
Usage
from transformers import AutoProcessor, AutoModelForCTC
import torch
import librosa
# Load model and processor
processor = AutoProcessor.from_pretrained("HAMMALE/mms-darija-finetuned")
model = AutoModelForCTC.from_pretrained("HAMMALE/mms-darija-finetuned")
# Load and preprocess audio
audio, sr = librosa.load("path/to/darija/audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Inference
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(f"Transcription: {transcription}")
Training Details
The model was fine-tuned on the Darija Bible Aligned Dataset, which contains audio segments from the Moroccan Standard Translation (MSTD) of the Bible with aligned text transcriptions.
Limitations
- Trained specifically on religious text (Bible translations)
- May not perform well on colloquial/everyday Darija speech
- Limited vocabulary outside religious domain
Citation
@misc{darija-mms-finetuned,
title={MMS-1B-All Fine-tuned on Darija Bible Dataset},
author={HAMMALE},
year={2025},
publisher={Hugging Face},
journal={Hugging Face Model Hub},
howpublished={\url{https://huggingface.co/HAMMALE/mms-darija-finetuned}}
}
Acknowledgments
- Original MMS model by Meta AI
- Darija Bible dataset by Morocco Bible Society
- Audio alignment using Facebook's MMS toolkit
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support