File size: 4,378 Bytes
8b42149
 
48bf205
8b42149
 
 
 
 
6c8c34d
 
dd65d25
6c8c34d
 
 
 
dd65d25
3a48737
8b42149
 
48bf205
 
 
 
dd65d25
 
 
 
 
 
 
 
 
48bf205
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5aced35
48bf205
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: Transcript Generator
author: Enrique Cardoza
emoji: πŸ’»
sdk: gradio
sdk_version: 5.33.0
app_file: app.py
pinned: false
tags:
  - mcp-server-track
  - speech-to-text
  - whisper
  - groq
  - transcript
  - api
  - stt
demo_video: https://youtu.be/0wBCbXzK8TE
---

# Transcript Generator

A powerful MCP (Model Context Protocol) server that transcribes audio and video files into text using Groq's Whisper model. This tool enables AI assistants to process audio content, making multimedia data accessible for analysis and understanding.

## πŸ“Ή Demo Video

[![Transcript Generator Demo](https://img.youtube.com/vi/0wBCbXzK8TE/0.jpg)](https://youtu.be/0wBCbXzK8TE)

There are three ways to use this project:
1. **Directly on the Hugging Face space** - Upload your audio/video files and hit the transcript button
2. **Using your favorite client** like Cursor, Windsurf or any other IDE that supports MCP
3. **Using a custom agent** - Set up the MCP server with its available tools in your own application

## πŸ” Project Description

Transcript Generator is an AI-powered transcription service built for the Gradio Agents & MCP Hackathon 2025. It leverages Groq's implementation of the Whisper Large V3 Turbo model to accurately convert spoken content from audio and video files into written text.

The service supports:
- File uploads (up to 25MB)
- Direct URL transcription
- Various audio/video formats
- Integration with MCP clients

## πŸ› οΈ Available MCP Tools

### 1. `transcript_generator_transcribe_audio`

Transcribes audio/video files uploaded directly to the service (runs in local).

**Parameters:**
- `audio_file` (string): Path to an audio or video file to transcribe
- `api_key` (string): Your Groq API key, required for authentication

**Returns:** A text transcript of the spoken content in the audio file

### 2. `transcript_generator_transcribe_audio_from_url`

Transcribes audio/video files from a URL.

**Parameters:**
- `audio_url` (string): URL to an audio or video file to transcribe (http or https)
- `api_key` (string): Your Groq API key, required for authentication

**Returns:** A text transcript of the spoken content in the audio file

## πŸ“‹ Supported File Formats

- **Audio formats:** MP3, MPGA, M4A, WAV, FLAC, OGG, AAC
- **Video formats:** MP4, MPEG, WebM
- **Maximum file size:** 25MB

## πŸ”Œ MCP Integration

### SSE Configuration (Cursor, Windsurf, Cline)

To add this MCP to clients that support SSE, add the following to your MCP config:

```json
{
  "mcpServers": {
    "gradio": {
      "url": "https://agents-mcp-hackathon-transcript-generator.hf.space/gradio_api/mcp/sse"
    }
  }
}
```

### Stdio Configuration (Node.js required)

For clients that only support stdio:

```json
{
  "mcpServers": {
    "gradio": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://agents-mcp-hackathon-transcript-generator.hf.space/gradio_api/mcp/sse",
        "--transport",
        "sse-only"
      ]
    }
  }
}
```

### YAML Configuration (ContinueDev extension)

```yaml
name: Transcript MCP Server
description: A new MCP server for handling transcripts.
version: 0.0.1
schema: v1
mcpServers:
  - name: Transcript MCP server
    command: npx
    args:
      - mcp-remote
      - https://agents-mcp-hackathon-transcript-generator.hf.space/gradio_api/mcp/sse
      - --transport
      - sse-only
```

## πŸ”‘ Authentication

You'll need a Groq API key to use this service. You can obtain one from the [Groq Console](https://console.groq.com/).

The API key can be provided in several ways:
1. As a parameter in the tool call
2. Set as an environment variable (`GROQ_API_KEY`)
3. In the request headers (for certain clients)

## πŸ’‘ Usage Example

When using with an AI assistant that supports MCP, you can request transcriptions with prompts like:

> "Please generate the transcript for this audio file: https://huggingface.co/spaces/anewryzm/transcript-generator-client/resolve/main/test_files/this%20people%203.m4a"

The assistant will use the appropriate MCP tool to fetch and return the transcript.

## πŸ”— Useful Links

- [Get your Groq API key](https://console.groq.com/)
- [Groq Documentation](https://console.groq.com/docs)
- [Supported audio formats](https://console.groq.com/docs/speech-to-text)
- [Hugging Face Spaces Configuration](https://huggingface.co/docs/hub/spaces-config-reference)