2.3 KiB
2.3 KiB
WhisperBenchmarks
This repository provides easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings.
Based on https://gitlab.com/aadnk/whisper-webui
Models
Model | Command |
---|---|
faster-large-v3 | --whisper_implementation faster-whisper --model large-v3 |
faster-medium | --whisper_implementation faster-whisper --model medium |
faster-small | --whisper_implementation faster-whisper --model small |
faster-tiny | --whisper_implementation faster-whisper --model tiny |
Links
Videos are chosen for being short and matching their given category
Categories | Title | Links | Length | Type |
---|---|---|---|---|
Poor mic placement | Body camera footage from July 10 traffic stop | Internet Archive | 2:22 | MP4 |
Thick accents | Moonshine for Medicine Popcorn Sutton | Internet Archive | 1:35 | MP4 |
Artifacts in audio | 2002 007 Movie Trailer Commercial Bad Video | Internet Archive | 0:14 | MP4 |
Ideal audio (one speaker) | 8 Bit Bookclub | Internet Archive | 1:44 | MP3 |
How to Run Whisper Benchmarks
-- TODO --
Results
Results are for the complete run which includes loading the model, running VAD, and running the transcription. Links are embeded for each category
CPU Benchmarks
CPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | (Docker/Native) | Model |
---|
GPU Benchmarks
GPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | (Docker/Native) | Model |
---|---|---|---|---|---|---|
RTX 2060S | 00:02.14 | 00:09.99 | 00:05.07 | 00:11.02 | Native | Faster-Medium |
Todo:
- Write easy bash scripts for running a set of benchmarks with an easy cleanup
- Finalize a standard format for exporting the data into a spreadsheet