Hailay
/

MachineT_TigEng

Safetensors

English

Tigre

tokenizer

machine-translation

Model card Files Files and versions Community

Hailay Kidu Teklehaymanot commited on May 28

Commit

0cbc581

1 Parent(s): 3af0779

Add YAML metadata to README for Huggingface model card

Browse files

Files changed (1) hide show

README.md +59 -31

README.md CHANGED Viewed

@@ -1,37 +1,65 @@
-# Machine Translation Model: English ↔ Tigrinya
 This model is a fine-tuned machine translation model trained to translate between English and Tigrinya. It was trained on the parallel corpus of English and Tigrinya sentences.
-## Model Overview
-- **Model Type**: MarianMT (Multilingual Transformer Model)
-- **Languages**: English ↔ Tigrinya
-- **Model Architecture**: MarianMT, fine-tuned for English ↔ Tigrinya translation
-- **Training Framework**: Hugging Face Transformers, PyTorch
-## Training Details
-- **Training Dataset**: NLLB Parallel Corpus (English ↔ Tigrinya)
-- **Training Epochs**: 3
-- **Batch Size**: 8
-- **Max Length**: 128 tokens
-- **Learning Rate**: Starts from `1.44e-07` and decays during training
-- **Training Loss**:
-    - Final training loss: 0.4756
-    - Per-epoch loss progress:
-      - Epoch 1: 0.443
-      - Epoch 2: 0.4077
-      - Epoch 3: 0.4379
-- **Gradient Norms**:
-    - Epoch 1: 1.14
-    - Epoch 2: 1.11
-    - Epoch 3: 1.06
-- **Training Time**: 43376.7 seconds (~12 hours)
-- **Training Speed**:
-    - Training samples per second: 96.7
-    - Training steps per second: 12.08
 ## Model Usage

+---
+language:
+  - eng     # English
+  - tig     # Tigrinya
+tags:
+  - tokenizer
+  - machine-translation
+license: mit
+datasets:
+  - nllb  # NLLB training dataset
+  - opus  # OPUS parallel data for testing
+metrics:
+  - bleu
+---
+# English-Tigrinya Tokenizer
+This tokenizer is trained for English to Tigrinya machine translation tasks using the NLLB dataset for training and OPUS parallel data for testing.
+## Model Details
+- **Languages:** English, Tigrinya
+- **Model type:** Tokenizer using SentencePiece
+- **License:** MIT License
+- **Training dataset:** NLLB
+- **Testing dataset:** OPUS parallel data
+- **Evaluation metric:** BLEU score
+## Machine Translation Model: English ↔ Tigrinya
 This model is a fine-tuned machine translation model trained to translate between English and Tigrinya. It was trained on the parallel corpus of English and Tigrinya sentences.
+### Model Overview
+- **Model Type**: MarianMT (Multilingual Transformer Model)
+- **Languages**: English ↔ Tigrinya
+- **Model Architecture**: MarianMT, fine-tuned for English ↔ Tigrinya translation
+- **Training Framework**: Hugging Face Transformers, PyTorch
+### Training Details
+- **Training Dataset**: NLLB Parallel Corpus (English ↔ Tigrinya)
+- **Training Epochs**: 3
+- **Batch Size**: 8
+- **Max Length**: 128 tokens
+- **Learning Rate**: Starts from `1.44e-07` and decays during training
+- **Training Loss**:
+    - Final training loss: 0.4756
+    - Per-epoch loss progress:
+      - Epoch 1: 0.443
+      - Epoch 2: 0.4077
+      - Epoch 3: 0.4379
+- **Gradient Norms**:
+    - Epoch 1: 1.14
+    - Epoch 2: 1.11
+    - Epoch 3: 1.06
+- **Training Time**: 43376.7 seconds (~12 hours)
+- **Training Speed**:
+    - Training samples per second: 96.7
+    - Training steps per second: 12.08
 ## Model Usage