Spaces:

amirjamali
/

accent-detector

Sleeping

App Files Files Community

amirjamali commited on May 23

Commit

635694f

unverified ·

1 Parent(s): a0fcd83

Refactor README and enhance Streamlit app with accent detection features, audio processing, and improved deployment instructions

Browse files

Files changed (3) hide show

README.md +48 -10
requirements.txt +7 -7
src/streamlit_app.py +368 -25

README.md CHANGED Viewed

@@ -1,26 +1,64 @@
 ---
-title: Accent Detector
-emoji: 🚀
-colorFrom: red
-colorTo: red
 sdk: docker
 app_port: 8501
 tags:
 - streamlit
 pinned: false
-short_description: Streamlit template space
 license: mit
 ---
-# 🎤 Accent Detection Tool
-This app accepts a public video URL, extracts the audio, detects the English-speaking accent or language, and provides a confidence score.
 ## Usage
-- Enter a Loom or MP4 link.
-- Click "Analyze"
-- Get accent classification and score.
 ## Powered By
 - [SpeechBrain](https://huggingface.co/speechbrain/lang-id-commonlanguage_ecapa)
 - [Streamlit](https://streamlit.io)

 ---
+title: English Accent Detector
+emoji: 🎤
+colorFrom: blue
+colorTo: green
 sdk: docker
 app_port: 8501
 tags:
 - streamlit
+- audio
+- accent-detection
+- hiring
 pinned: false
+short_description: Detect and analyze English accents from videos
 license: mit
 ---
+# 🎤 English Accent Detection Tool
+This app analyzes a speaker's English accent from video URLs or audio uploads, providing detailed insights for hiring evaluation purposes.
+## Features
+- **Video URL Processing**: Accept and analyze videos from Loom, YouTube, or direct MP4 links
+- **Audio Upload Support**: Directly upload audio files for analysis
+- **English Accent Classification**: Identify specific English accents (American, British, Australian, etc.)
+- **Confidence Scoring**: Get detailed confidence scores for English proficiency
+- **Detailed Analysis**: Receive expert-like explanations about accent characteristics
+- **Visual Feedback**: View audio waveforms and listen to the processed audio
 ## Usage
+1. **Via Video URL**:
+   - Enter a public video URL (Loom, YouTube, direct MP4, etc.)
+   - Click "Analyze Video"
+   - View the accent classification, confidence scores, and analysis
+2. **Via Audio Upload**:
+   - Upload an audio file (WAV, MP3, M4A, OGG)
+   - Click "Analyze Audio"
+   - View the results
+## Technology Stack
+- **Audio Processing**: FFmpeg, Librosa
+- **ML Models**: SpeechBrain, Transformers
+- **UI**: Streamlit
+- **Deployment**: Docker
+## Requirements
+- Python 3.9+
+- FFmpeg
+- See requirements.txt for Python dependencies
+## Deployment
+The app is containerized with Docker for easy deployment. Use the included Dockerfile to build and run:
+```bash
+docker build -t accent-detector .
+docker run -p 8501:8501 accent-detector
+```
 ## Powered By
 - [SpeechBrain](https://huggingface.co/speechbrain/lang-id-commonlanguage_ecapa)
+- [Hugging Face Transformers](https://huggingface.co/speechbrain/lang-id-voxlingua107-ecapa)
 - [Streamlit](https://streamlit.io)
+- [FFmpeg](https://ffmpeg.org/)

requirements.txt CHANGED Viewed

@@ -1,11 +1,11 @@
 streamlit>=1.25.0
 yt_dlp>=2023.7.6
-moviepy>=1.0.3
-numpy>=1.22.0
-decorator>=4.4.2
-imageio>=2.9.0
-imageio-ffmpeg>=0.4.5
-proglog>=0.1.10
 speechbrain>=0.5.15
 torch>=2.0.0
-torchaudio>=2.0.0

 streamlit>=1.25.0
 yt_dlp>=2023.7.6
 speechbrain>=0.5.15
 torch>=2.0.0
+torchaudio>=2.0.0
+transformers>=4.30.0
+librosa>=0.10.0
+matplotlib>=3.7.0
+scikit-learn>=1.3.0
+openai>=1.0.0
+python-dotenv>=1.0.0

src/streamlit_app.py CHANGED Viewed

@@ -1,44 +1,387 @@
 import streamlit as st
 import os
 import yt_dlp
-from moviepy.editor import VideoFileClip
 from speechbrain.pretrained import LanguageIdentification
 def download_video(url, video_path="video.mp4"):
     ydl_opts = {"outtmpl": video_path}
     with yt_dlp.YoutubeDL(ydl_opts) as ydl:
         ydl.download([url])
 def extract_audio(video_path="video.mp4", audio_path="audio.wav"):
-    video = VideoFileClip(video_path)
-    video.audio.write_audiofile(audio_path)
-def detect_accent(audio_path="audio.wav"):
-    classifier = LanguageIdentification.from_hparams(source="speechbrain/lang-id-commonlanguage_ecapa", savedir="tmp_model")
-    prediction = classifier.classify_file(audio_path)
-    lang = prediction[1]
-    score = float(prediction[0][0])
-    return lang, score
 # --- Streamlit App ---
-st.title("🎤 Accent Detection Tool")
-url = st.text_input("Enter a public video URL (e.g. Loom or direct MP4)")
-if st.button("Analyze"):
-    try:
-        with st.spinner("Downloading video..."):
-            download_video(url)
-        with st.spinner("Extracting audio..."):
-            extract_audio()
-        with st.spinner("Detecting accent..."):
-            lang, confidence = detect_accent()
-        st.success("Analysis Complete!")
-        st.markdown(f"**Predicted Accent/Language:** `{lang}`")
-        st.markdown(f"**Confidence Score:** `{round(confidence * 100, 2)}%`")
-        st.caption("Note: This is based on SpeechBrain's language model and may group accents by broader language class.")
-    except Exception as e:
-        st.error(f"Error: {str(e)}")

 import streamlit as st
 import os
 import yt_dlp
+import subprocess
+import librosa
+import numpy as np
+import torch
 from speechbrain.pretrained import LanguageIdentification
+from transformers import AutoProcessor, AutoModelForAudioClassification
+from dotenv import load_dotenv
+import matplotlib.pyplot as plt
+import tempfile
+import time
+# Comment for deployment instructions:
+# To deploy this app:
+# 1. Make sure Docker is installed
+# 2. Build the Docker image: docker build -t accent-detector .
+# 3. Run the container: docker run -p 8501:8501 accent-detector
+# 4. Access the app at http://localhost:8501
+#
+# For cloud deployment:
+# - Streamlit Cloud: Connect your GitHub repository to Streamlit Cloud
+# - Hugging Face Spaces: Use the Docker deployment option
+# - Azure/AWS/GCP: Deploy the container using their container services
+# Load environment variables (if .env file exists)
+try:
+    load_dotenv()
+except:
+    pass
+# Check for OpenAI API access - optional for enhanced explanations
+try:
+    import openai
+    openai.api_key = os.getenv("OPENAI_API_KEY")
+    have_openai = openai.api_key is not None
+except (ImportError, AttributeError):
+    have_openai = False
+# English accent categories
+ENGLISH_ACCENTS = {
+    "en-us": "American English",
+    "en-gb": "British English",
+    "en-au": "Australian English",
+    "en-ca": "Canadian English",
+    "en-ie": "Irish English",
+    "en-scotland": "Scottish English",
+    "en-in": "Indian English",
+    "en-za": "South African English",
+    "en-ng": "Nigerian English",
+    "en-caribbean": "Caribbean English",
+}
 def download_video(url, video_path="video.mp4"):
+    """Download a video from a URL"""
     ydl_opts = {"outtmpl": video_path}
     with yt_dlp.YoutubeDL(ydl_opts) as ydl:
         ydl.download([url])
+    return os.path.exists(video_path)
 def extract_audio(video_path="video.mp4", audio_path="audio.wav"):
+    """Extract audio from video file using ffmpeg"""
+    try:
+        subprocess.run(
+            ['ffmpeg', '-i', video_path, '-vn', '-acodec', 'pcm_s16le', '-ar', '16000', '-ac', '1', audio_path],
+            check=True,
+            capture_output=True
+        )
+        return os.path.exists(audio_path)
+    except subprocess.CalledProcessError as e:
+        st.error(f"Error extracting audio: {e}")
+        st.error(f"ffmpeg output: {e.stderr.decode('utf-8')}")
+        raise
+class AccentDetector:
+    def __init__(self):
+        # Initialize the language identification model
+        self.lang_id = LanguageIdentification.from_hparams(
+            source="speechbrain/lang-id-commonlanguage_ecapa",
+            savedir="tmp_model"
+        )
+        # Initialize the English accent classifier - using VoxLingua107 for now
+        # In production, you'd use a more specialized accent model
+        try:
+            self.model_name = "speechbrain/lang-id-voxlingua107-ecapa"
+            self.processor = AutoProcessor.from_pretrained(self.model_name)
+            self.model = AutoModelForAudioClassification.from_pretrained(self.model_name)
+            self.have_accent_model = True
+        except Exception as e:
+            st.warning(f"Could not load accent model: {str(e)}")
+            self.have_accent_model = False
+    def is_english(self, audio_path, threshold=0.7):
+        """
+        Determine if the speech is English and return confidence score
+        """
+        prediction = self.lang_id.classify_file(audio_path)
+        lang = prediction[1]
+        score = float(prediction[0][0])
+        # Check if language is English (slightly fuzzy match)
+        is_english = "eng" in lang.lower() or "en-" in lang.lower() or lang.lower() == "en"
+        return is_english, lang, score
+    def classify_accent(self, audio_path):
+        """
+        Classify the specific English accent
+        """
+        if not self.have_accent_model:
+            return "Unknown English Accent", 0.0
+        try:
+            # Load and preprocess audio
+            audio, sr = librosa.load(audio_path, sr=16000)
+            inputs = self.processor(audio, sampling_rate=sr, return_tensors="pt")
+            # Get predictions
+            with torch.no_grad():
+                outputs = self.model(**inputs)
+            # Get probabilities
+            probs = outputs.logits.softmax(dim=-1)[0]
+            prediction_id = probs.argmax().item()
+            confidence = probs[prediction_id].item()
+            # Get predicted label
+            id2label = self.model.config.id2label
+            accent_code = id2label[prediction_id]
+            # Map to English accent if possible
+            if accent_code.startswith('en-'):
+                accent = ENGLISH_ACCENTS.get(accent_code, f"English ({accent_code})")
+                confidence = confidence  # Keep confidence as-is for English accents
+            else:
+                # If it's not an English accent code, use our pre-classification
+                is_english, _, _ = self.is_english(audio_path)
+                if is_english:
+                    accent = "General English"
+                else:
+                    accent = f"Non-English ({accent_code})"
+                confidence *= 0.7  # Reduce confidence for non-specific matches
+            return accent, confidence
+        except Exception as e:
+            st.error(f"Error in accent classification: {str(e)}")
+            return "Unknown English Accent", 0.0
+    def generate_explanation(self, audio_path, accent, confidence, is_english, language):
+        """
+        Generate an explanation of the accent detection results using OpenAI API (if available)
+        """
+        if not have_openai:
+            if is_english:
+                return f"The speaker has a {accent} accent with {confidence*100:.1f}% confidence. The speech was identified as English."
+            else:
+                return f"The speech was identified as {language}, not English. English confidence is low."
+        try:
+            import openai
+            is_english, lang, lang_score = self.is_english(audio_path)
+            prompt = f"""
+            Audio analysis detected a speaker with the following characteristics:
+            - Primary accent/language: {accent}
+            - Confidence score: {confidence*100:.1f}%
+            - Detected language category: {lang}
+            - Is English: {is_english}
+            Based on this information, provide a 2-3 sentence summary about the speaker's accent.
+            Focus on how clear their English is and any notable accent characteristics.
+            This is for hiring purposes to evaluate English speaking abilities.
+            """
+            response = openai.chat.completions.create(
+                model="gpt-3.5-turbo",
+                messages=[
+                    {"role": "system", "content": "You are an accent analysis specialist providing factual assessments."},
+                    {"role": "user", "content": prompt}
+                ],
+                max_tokens=150
+            )
+            return response.choices[0].message.content.strip()
+        except Exception as e:
+            st.error(f"Error generating explanation: {str(e)}")
+            if is_english:
+                return f"The speaker has a {accent} accent with {confidence*100:.1f}% confidence. The speech was identified as English."
+            else:
+                return f"The speech was identified as {language}, not English. English confidence is low."
+    def analyze_audio(self, audio_path):
+        """
+        Complete analysis pipeline returning all needed results
+        """
+        # Check if it's English
+        is_english, lang, lang_score = self.is_english(audio_path)
+        # Classify accent if it's English
+        if is_english:
+            accent, accent_confidence = self.classify_accent(audio_path)
+            english_confidence = lang_score * 100  # Scale to percentage
+        else:
+            accent = f"Non-English ({lang})"
+            accent_confidence = lang_score
+            english_confidence = max(0, min(30, lang_score * 50))  # Cap at 30% if non-English
+        # Generate explanation
+        explanation = self.generate_explanation(audio_path, accent, accent_confidence, is_english, lang)
+        # Create visualization of the audio waveform
+        try:
+            y, sr = librosa.load(audio_path, sr=None)
+            fig, ax = plt.subplots(figsize=(10, 2))
+            ax.plot(y)
+            ax.set_xlabel('Sample')
+            ax.set_ylabel('Amplitude')
+            ax.set_title('Audio Waveform')
+            plt.tight_layout()
+            audio_viz = fig
+        except Exception as e:
+            st.warning(f"Could not generate audio visualization: {str(e)}")
+            audio_viz = None
+        return {
+            "is_english": is_english,
+            "accent": accent,
+            "accent_confidence": accent_confidence * 100,  # Scale to percentage
+            "english_confidence": english_confidence,
+            "language_detected": lang,
+            "explanation": explanation,
+            "audio_viz": audio_viz
+        }
+def process_uploaded_audio(uploaded_file):
+    """Process uploaded audio file"""
+    with tempfile.NamedTemporaryFile(delete=False, suffix='.wav') as temp_file:
+        temp_file.write(uploaded_file.getvalue())
+        audio_path = temp_file.name
+    detector = AccentDetector()
+    results = detector.analyze_audio(audio_path)
+    # Clean up
+    os.unlink(audio_path)
+    return results
 # --- Streamlit App ---
+st.set_page_config(
+    page_title="🎤 English Accent Detector",
+    page_icon="🎤",
+    layout="wide"
+)
+st.title("🎤 English Accent Detection Tool")
+st.markdown("""
+This app analyzes a speaker's English accent from a video or audio source.
+It provides:
+- Classification of the accent (British, American, etc.)
+- Confidence score for English proficiency
+- Explanation of accent characteristics
+""")
+# Create tabs for different input methods
+tab1, tab2 = st.tabs(["Video URL", "Upload Audio"])
+with tab1:
+    url = st.text_input("Enter a public video URL (e.g. Loom, YouTube, or direct MP4 link)")
+    if st.button("Analyze Video"):
+        if not url:
+            st.warning("Please enter a valid URL")
+        else:
+            try:
+                # Create a placeholder for status updates
+                status = st.empty()
+                # Generate unique filenames using timestamp to avoid conflicts
+                timestamp = str(int(time.time()))
+                video_path = f"video_{timestamp}.mp4"
+                audio_path = f"audio_{timestamp}.wav"
+                # Download and process the video
+                status.text("Downloading video...")
+                download_success = download_video(url, video_path)
+                if not download_success:
+                    st.error("Failed to download video")
+                else:
+                    status.text("Extracting audio...")
+                    extract_success = extract_audio(video_path, audio_path)
+                    if not extract_success:
+                        st.error("Failed to extract audio")
+                    else:
+                        status.text("Analyzing accent... (this may take a moment)")
+                        detector = AccentDetector()
+                        results = detector.analyze_audio(audio_path)
+                        # Display results
+                        st.success("✅ Analysis Complete!")
+                        # Create columns for results
+                        col1, col2 = st.columns([2, 1])
+                        with col1:
+                            st.subheader("Accent Analysis Results")
+                            st.markdown(f"**Detected Accent:** {results['accent']}")
+                            st.markdown(f"**English Proficiency:** {results['english_confidence']:.1f}%")
+                            st.markdown(f"**Accent Confidence:** {results['accent_confidence']:.1f}%")
+                            # Show explanation in a box
+                            st.markdown("### Expert Analysis")
+                            st.info(results['explanation'])
+                        with col2:
+                            if results['audio_viz']:
+                                st.pyplot(results['audio_viz'])
+                            # Show audio playback
+                            st.audio(audio_path)
+                # Clean up files
+                try:
+                    if os.path.exists(video_path):
+                        os.remove(video_path)
+                    if os.path.exists(audio_path):
+                        os.remove(audio_path)
+                except Exception as e:
+                    st.warning(f"Couldn't clean up temporary files: {str(e)}")
+            except Exception as e:
+                st.error(f"Error during analysis: {str(e)}")
+with tab2:
+    uploaded_file = st.file_uploader("Upload an audio file (WAV, MP3, etc.)", type=["wav", "mp3", "m4a", "ogg"])
+    if uploaded_file is not None:
+        st.audio(uploaded_file)
+        if st.button("Analyze Audio"):
+            with st.spinner("Analyzing audio... (this may take a moment)"):
+                try:
+                    results = process_uploaded_audio(uploaded_file)
+                    # Display results
+                    st.success("✅ Analysis Complete!")
+                    # Create columns for results
+                    col1, col2 = st.columns([2, 1])
+                    with col1:
+                        st.subheader("Accent Analysis Results")
+                        st.markdown(f"**Detected Accent:** {results['accent']}")
+                        st.markdown(f"**English Proficiency:** {results['english_confidence']:.1f}%")
+                        st.markdown(f"**Accent Confidence:** {results['accent_confidence']:.1f}%")
+                        # Show explanation in a box
+                        st.markdown("### Expert Analysis")
+                        st.info(results['explanation'])
+                    with col2:
+                        if results['audio_viz']:
+                            st.pyplot(results['audio_viz'])
+                except Exception as e:
+                    st.error(f"Error during analysis: {str(e)}")
+# Add footer with deployment info
+st.markdown("---")
+st.markdown("Deployed using Streamlit • Built with SpeechBrain and Transformers")
+# Add a section for how it works
+with st.expander("ℹ️ How It Works"):
+    st.markdown("""
+    This app uses a multi-stage process to analyze a speaker's accent:
+    1. **Audio Extraction**: The audio track is extracted from the input video or directly processed from uploaded audio.
+    2. **Language Identification**: First, we determine if the speech is English using SpeechBrain's language identification model.
+    3. **Accent Classification**: For English speech, we analyze the specific accent using a transformer-based model trained on diverse accent data.
+    4. **English Proficiency Score**: A confidence score is calculated based on both language identification and accent clarity.
+    5. **Analysis Summary**: An explanation is generated describing accent characteristics relevant for hiring evaluations.
+    """)