Speed benchmark
I would like to know if this model is actually faster than the original model. Could you put some relevant benchmarks in the readme?
Large-v3 models on GPU
Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory | WER % |
---|---|---|---|---|---|---|
openai/whisper-large-v3 | fp16 | 5 | 2m23s | MB | MB | |
openai/whisper-turbo | fp16 | 5 | 39s | MB | MB | |
faster-whisper | fp16 | 5 | 52.023s | 4521MB | 901MB | 2.883 |
faster-whisper | int8 | 5 | 52.639s | 2953MB | 2261MB | 4.594 |
faster-distil-large-v3 | fp16 | 5 | 26.126s | 2409MB | 900MB | 2.392 |
faster-distil-large-v3 | int8 | 5 | 22.537s | 1481MB | 1468MB | 2.392 |
faster-large-v3-turbo | fp16 | 5 | 19.155s | 2537MB | 899MB | 1.919 |
faster-large-v3-turbo | int8 | 5 | 19.591s | 1545MB | 1526MB | 1.919 |
WER on librispeech clean val split.
GPU GeForce RTX 2080 Ti 11GB
Just general ASR for voice input, to avoid typing ;) it's a simple utility with a push-to-talk approach, trying to mimic https://superwhisper.com/ for Linux users.
It's there https://github.com/blakkd/faster-whisper-hotkey who knows, maybe it can be useful for you!
I'm experimenting and considering to add https://huggingface.co/nvidia/canary-1b-flash or https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/ as the WER and speed are amazing! And for this specific use case, latency is an important factor.
The only thing is that parakeet is EN only, and canary only have 4-5 supported languages. So for now, this new turbo model was more than welcome, especially your faster-whisper version!