Speed benchmark

#3
by pritam - opened

I would like to know if this model is actually faster than the original model. Could you put some relevant benchmarks in the readme?

Large-v3 models on GPU

Implementation Precision Beam size Time Max. GPU memory Max. CPU memory WER %
openai/whisper-large-v3 fp16 5 2m23s MB MB
openai/whisper-turbo fp16 5 39s MB MB
faster-whisper fp16 5 52.023s 4521MB 901MB 2.883
faster-whisper int8 5 52.639s 2953MB 2261MB 4.594
faster-distil-large-v3 fp16 5 26.126s 2409MB 900MB 2.392
faster-distil-large-v3 int8 5 22.537s 1481MB 1468MB 2.392
faster-large-v3-turbo fp16 5 19.155s 2537MB 899MB 1.919
faster-large-v3-turbo int8 5 19.591s 1545MB 1526MB 1.919

WER on librispeech clean val split.
GPU GeForce RTX 2080 Ti 11GB

This comment has been hidden

Thanks for converting @deepdml ! I just added it to my project :)
The metrics are impressive!

Thanks for converting @deepdml ! I just added it to my project :)
The metrics are impressive!

I’m glad it worked well in your project. What kind of tasks are you testing the model on? I’m here if you need anything else or want to discuss the results.

Just general ASR for voice input, to avoid typing ;) it's a simple utility with a push-to-talk approach, trying to mimic https://superwhisper.com/ for Linux users.
It's there https://github.com/blakkd/faster-whisper-hotkey who knows, maybe it can be useful for you!

I'm experimenting and considering to add https://huggingface.co/nvidia/canary-1b-flash or https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/ as the WER and speed are amazing! And for this specific use case, latency is an important factor.
The only thing is that parakeet is EN only, and canary only have 4-5 supported languages. So for now, this new turbo model was more than welcome, especially your faster-whisper version!

Sign up or log in to comment