hriteshMaikap
/

languageClassifier

@@ -1,68 +1,95 @@
 ---
 language:
-  - en
 tags:
-  - audio
-  - language-identification
-  - speech
-  - indian-languages
-datasets:
-  - hmsolanki/indian-languages-audio-dataset
-metrics:
-  - accuracy
-  - f1
 ---
-# Indian Language Identification Model
-This model identifies the language spoken in an audio clip from a set of 10 Indian languages.
-## Model Details
-- **Model Type:** Audio Language Classifier
-- **Languages Supported:** Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Tamil, Telugu, Urdu
-- **Framework:** PyTorch
-- **Training Dataset:** [Indian Languages Audio Dataset](https://www.kaggle.com/datasets/hmsolanki/indian-languages-audio-dataset/)
-- **Audio Sampling Rate:** 16kHz
-## Performance
-- **Accuracy:** 0.8465
-- **Precision:** 0.8457
-- **Recall:** 0.8465
-- **F1 Score:** 0.8452
 ## Usage
 ```python
-import torch
-import torchaudio
 import json
-from transformers import pipeline
-# Load the model
-pipe = pipeline("audio-classification", model="hriteshMaikap/languageClassifier")
-# Or use it directly
-waveform, sample_rate = torchaudio.load("path/to/audio.wav")
-if sample_rate != 16000:
-    resampler = torchaudio.transforms.Resample(sample_rate, 16000)
-    waveform = resampler(waveform)
-# Get prediction
-prediction = pipe(waveform)
-print(f"Detected language: {prediction[0]['label']}")
-```
-## Limitations
-- Works best with clear audio without background noise
-- Audio should be sampled at 16kHz for optimal performance
-## Training Details
-This model was trained on a dataset of Indian language audio samples. The model architecture combines CNN layers for feature extraction with transformer layers for classification.
-## Confusion Matrix
-![Confusion Matrix](/confusion_matrix.png)

 ---
 language:
+- mr
+- te
+- ml
 tags:
+- audio-classification
+- speech-recognition
+- indian-languages
+- tensorflow
+license: apache-2.0
 ---
+# Language Classifier - Indian Languages (Marathi, Telugu, Malayalam)
+This model classifies audio samples into three Indian languages: Marathi, Telugu, and Malayalam.
+## Model Description
+### Model Architecture
+- 1D Convolutional Neural Network (CNN) with the following key components:
+  - 3 Convolutional blocks with increasing filters (64, 128, 256)
+  - Batch Normalization and ReLU activation after each convolution
+  - MaxPooling and Dropout for regularization
+  - Dense layers with 256 units followed by a Softmax output layer
+- Input: Audio features (MFCC + Delta features)
+- Output: Language classification probabilities
+### Training Data
+The model was trained on:
+- Total samples per language: 1000
+  - Training: 700 samples
+  - Validation: 150 samples
+  - Test: 150 samples
+### Features
+- MFCC (Mel-frequency cepstral coefficients) with delta features
+- Number of MFCC coefficients: 13
+- Maximum padding length: 174
+- Feature type: MFCC with delta and delta-delta features
+### Training Hyperparameters
+- Optimizer: AdamW
+- Learning rate: 0.001
+- Batch size: 64
+- Early stopping with patience of 10
+- Learning rate reduction on plateau
+- Loss function: Categorical Cross-entropy
+## Performance
+The model achieves strong performance in distinguishing between Marathi, Telugu, and Malayalam speech samples.
+### Intended Use
+This model is designed for:
+- Language identification in audio samples
+- Speech processing applications focusing on Indian languages
+- Research and development in multilingual speech systems
+### Limitations
+- Limited to three languages: Marathi, Telugu, Malayalam
+- Fixed input length requirement
+- May not perform well on very noisy audio
+- Not suitable for real-time processing without proper preprocessing
 ## Usage
 ```python
+import tensorflow as tf
+import numpy as np
+import joblib
 import json
+import librosa
+# Load the model, scaler, and config
+model = tf.keras.models.load_model('indic_language_classifier_mtm.keras')
+scaler = joblib.load('audio_feature_scaler_mtm.pkl')
+with open('config_mtm.json', 'r') as f:
+    config = json.load(f)
+def extract_features(audio_path, config):
+    audio, sr = librosa.load(audio_path, sr=None)
+    mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=config['n_mfcc'])
+    delta_mfccs = librosa.feature.delta(mfccs)
+    delta2_mfccs = librosa.feature.delta(mfccs, order=2)
+    features = np.concatenate((mfccs, delta_mfccs, delta2_mfccs), axis=0)
+    # Pad or truncate
+    if features.shape[1] > config['max_pad_len']:
+        features = features[:, :config['max_pad_len']]
+    else:
+        pad_width = config['max_pad_len'] - features.shape[1]
+        features = np.pad(features, pad_width=((0, 0), (0, pad_width)))
+    return features.T