whisperbenchmarks/README.md
2023-12-02 23:34:06 +00:00

2.9 KiB

WhisperBenchmarks

This repository provides easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings.

Based on https://gitlab.com/aadnk/whisper-webui

Models

Model Command
faster-large-v3 --whisper_implementation faster-whisper --model large-v3
faster-medium --whisper_implementation faster-whisper --model medium
faster-small --whisper_implementation faster-whisper --model small
faster-tiny --whisper_implementation faster-whisper --model tiny

Videos

Videos are chosen for being short and matching their given category

Categories Title Links Length Type
Poor mic placement Body camera footage from July 10 traffic stop Internet Archive 2:22 MP4
Thick accents Moonshine for Medicine Popcorn Sutton Internet Archive 1:35 MP4
Artifacts in audio 2002 007 Movie Trailer Commercial Bad Video Internet Archive 0:14 MP4
Ideal audio (one speaker) 8 Bit Bookclub Internet Archive 1:44 MP3

How to Run Whisper Benchmarks

-- TODO --

Results

Results are for the complete run which includes loading the model, running VAD, and running the transcription. Links are embeded in the results for each category

CPU Benchmarks

CPU Model Poor mic placement (m:s:ms) Thick accents (m:s:ms) Artifacts in audio (m:s:ms) Ideal audio (m:s:ms) (Docker/Native) Model

GPU Benchmarks

GPU Model Poor mic placement (m:s:ms) Thick accents (m:s:ms) Artifacts in audio (m:s:ms) Ideal audio (m:s:ms) (Docker/Native) Model
RTX 2060S 00:02.14 00:09.99 00:05.07 00:11.02 Native Faster-Medium

Todo:

  • Write easy bash scripts for running a set of benchmarks with an easy cleanup
  • Finalize a standard format for exporting the data into a spreadsheet