Spaces:
Running
Running
metadata
title: Speaker Diarization
emoji: 🔥
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: mit
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Real-Time Speaker Diarization
This project implements real-time speaker diarization using WebRTC, FastAPI, and Gradio. It automatically transcribes speech and identifies different speakers in real-time.
Architecture
The system is split into two components:
- Model Server (Hugging Face Space): Runs the speech recognition and speaker diarization models
- Signaling Server (Render): Handles WebRTC signaling for direct audio streaming from browser
Deployment Instructions
Deploy Model Server on Hugging Face Space
- Create a new Space on Hugging Face (Docker SDK)
- Upload all files from the
Speaker-Diarization
directory - In Space settings:
- Set Hardware to CPU (or GPU if available)
- Set the public visibility
- Environment: Make sure Docker SDK is selected
Deploy Signaling Server on Render
- Create a new Render Web Service
- Connect to your GitHub repo containing the
render-signal
directory - Configure Render service:
- Set Build Command:
cd render-signal && pip install -r requirements.txt
- Set Start Command:
cd render-signal && python backend.py
- Select Environment: Python 3
- Set Environment Variables:
HF_SPACE_URL
: Set to your Hugging Face Space URL (e.g.,your-username-speaker-diarization.hf.space
)
- Set Build Command:
Update Configuration
After both services are deployed:
Update
ui.py
on your Hugging Face Space:- Change
RENDER_SIGNALING_URL
to your Render app URL (wss://your-app.onrender.com/stream
) - Make sure
HF_SPACE_URL
matches your actual Hugging Face Space URL
- Change
Update
backend.py
on your Render service:- Set
API_WS
to your Hugging Face Space WebSocket URL (wss://your-username-speaker-diarization.hf.space/ws_inference
)
- Set
Usage
- Open your Hugging Face Space URL in a web browser
- Click "Start Listening" to begin
- Speak into your microphone
- The system will transcribe your speech and identify different speakers in real-time
Technology Stack
- Frontend: Gradio UI with WebRTC for audio streaming
- Signaling: FastRTC on Render for WebRTC signaling
- Backend: FastAPI + WebSockets
- Models:
- SpeechBrain ECAPA-TDNN for speaker embeddings
- Automatic Speech Recognition for transcription
License
MIT