Speaker-Diarization / README.md
Saiyaswanth007's picture
Backend connection
4641c1c
metadata
title: Speaker Diarization
emoji: 🔥
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: mit

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Real-Time Speaker Diarization

This project implements real-time speaker diarization using WebRTC, FastAPI, and Gradio. It automatically transcribes speech and identifies different speakers in real-time.

Architecture

The system is split into two components:

  1. Model Server (Hugging Face Space): Runs the speech recognition and speaker diarization models
  2. Signaling Server (Render): Handles WebRTC signaling for direct audio streaming from browser

Deployment Instructions

Deploy Model Server on Hugging Face Space

  1. Create a new Space on Hugging Face (Docker SDK)
  2. Upload all files from the Speaker-Diarization directory
  3. In Space settings:
    • Set Hardware to CPU (or GPU if available)
    • Set the public visibility
    • Environment: Make sure Docker SDK is selected

Deploy Signaling Server on Render

  1. Create a new Render Web Service
  2. Connect to your GitHub repo containing the render-signal directory
  3. Configure Render service:
    • Set Build Command: cd render-signal && pip install -r requirements.txt
    • Set Start Command: cd render-signal && python backend.py
    • Select Environment: Python 3
    • Set Environment Variables:
      • HF_SPACE_URL: Set to your Hugging Face Space URL (e.g., your-username-speaker-diarization.hf.space)

Update Configuration

After both services are deployed:

  1. Update ui.py on your Hugging Face Space:

    • Change RENDER_SIGNALING_URL to your Render app URL (wss://your-app.onrender.com/stream)
    • Make sure HF_SPACE_URL matches your actual Hugging Face Space URL
  2. Update backend.py on your Render service:

    • Set API_WS to your Hugging Face Space WebSocket URL (wss://your-username-speaker-diarization.hf.space/ws_inference)

Usage

  1. Open your Hugging Face Space URL in a web browser
  2. Click "Start Listening" to begin
  3. Speak into your microphone
  4. The system will transcribe your speech and identify different speakers in real-time

Technology Stack

  • Frontend: Gradio UI with WebRTC for audio streaming
  • Signaling: FastRTC on Render for WebRTC signaling
  • Backend: FastAPI + WebSockets
  • Models:
    • SpeechBrain ECAPA-TDNN for speaker embeddings
    • Automatic Speech Recognition for transcription

License

MIT