accent-detection / README.md
ash-171's picture
Update README.md
43fea58 verified
---
title: Accent Analyzer Agent
emoji: 🏒
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Various english accent detection
license: mit
---
# Accent Analyzer
This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech using Whisper Base, and ask follow-up questions based on the transcript using Gemma3:1b.
## What It Does
- Accepts a public **MP4 video URL**
- Extracts audio and transcribes it using **OpenAI Whisper Base**
- Detects accent using a **Jzuluaga/accent-id-commonaccent_xlsr-en-english** model
- Lets users ask **follow-up questions** about the transcript using **Gemma3**
- Deploys easily on **Hugging Face Spaces** with CPU
---
## Tech Stack
- **Streamlit** β€” UI
- **OpenAI Whisper (base)**: For speech-to-text transcription.
- **Jzuluaga/accent-id-commonaccent_xlsr-en-english**: For English accent classification.
- **Gemma3:1b via Ollama**: For generating answers to follow-up questions using context from the transcript.
- **Docker** β€” containerized for deployment
- **Hugging Face Spaces** β€” for hosting with CPU
---
## Project Structure
```
accent-analyzer/
β”œβ”€β”€ Dockerfile # Container setup
β”œβ”€β”€ start.sh # Serving Ollama and app setup
β”œβ”€β”€ README.md # Instruction about the app
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ streamlit_app.py # Main UI app
└── src/
β”œβ”€β”€ custome_interface.py # SpeechBrain custom interface
β”œβ”€β”€ tools/
β”‚ └── accent_tool.py # Audio analysis tool
└── app/
└── main_agent.py # Analysis + LLaMA agents
```
---
## Running Locally (GPU Required)
1. Clone the repo:
```bash
git clone https://huggingface.co/spaces/ash-171/accent-detection
cd accent-analyzer
```
2. Build the Docker image:
```bash
docker build -t accent-analyzer .
```
3. Run the container:
```bash
docker run --gpus all -p 8501:8501 accent-analyzer
```
4. You can also run : `streamlit run streamlit_app.py` to deploy the app locally.
5. Visit: [http://localhost:8501](http://localhost:8501)
---
## Requirements
`requirements.txt` should include at least:
```
streamlit>=1.25.0
requests==2.31.0
pydub==0.25.1
torch==1.11.0
torchaudio==0.11.0
speechbrain==0.5.12
transformers==4.29.2
asyncio==3.4.3
ffmpeg-python==0.2.0
openai-whisper==20230314
numpy==1.22.4
langchain>=0.1.0
langchain-community>=0.0.30
torchvision==0.12.0
langgraph>=0.0.20
```
---
## Notes
- Gemma3:1b is accessed via **Ollama** inside Docker β€” ensure it pulls on build.
- `custome_interface.py` is required by the accent model β€” it’s automatically downloaded in Dockerfile.
- Video URLs must be **direct links** to `.mp4` files.
---
## Example Prompt
```
Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4
```
Then follow up with:
```
Where is the speaker probably from?
What is the tone or emotion?
Summarize the video?
```
---
## Acknowledgments
This project uses the following models, frameworks, and tools:
- [OpenAI Whisper](https://github.com/openai/whisper): Automatic speech recognition model.
- [SpeechBrain](https://speechbrain.readthedocs.io/): Toolkit used for building and fine-tuning speech processing models.
- [Accent-ID CommonAccent](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english): Fine-tuned wav2vec2 model hosted on Hugging Face for English accent classification.
- [CustomEncoderWav2vec2Classifier](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english/blob/main/custom_interface.py): Custom interface used to load and run the accent model.
- [Gemma3:1b](https://ollama.com/library/gemma3:1b) via [Ollama](https://ollama.com): Large language model used for natural language follow-up based on transcripts.
- [Streamlit](https://streamlit.io): Python framework for building web applications.
- [Hugging Face Spaces](https://huggingface.co/spaces): Platform used for deploying this application on GPU infrastructure.
---
## Note
Due to unavailability of GPU the app will be extremely slow. The output has been test in local system and verified.
---
## Author
- Developed by [Aswathi T S](https://github.com/ash-171)
---
## License
This project is licensed under the `MIT License`.