brooke/whisperbenchmarks

Fork 0

Easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings.

Find a file

brooke 22c7aec8db Update README.md Add a how-to section and remove example data		2023-12-02 04:06:55 +00:00
LICENSE	Initial commit	2023-12-02 03:16:17 +00:00
README.md	Update README.md	2023-12-02 04:06:55 +00:00

README.md

WhisperBenchmarks

This repository provides easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings.

Links

Categories	Title	Links
Poor mic placement	Body camera footage from July 10 traffic stop	Internet Archive
Thick accents	Moonshine for Medicine Popcorn Sutton	Internet Archive
Low-quality audio	1994 90210 Melrose Place Promos Commercial	Internet Archive
Artifacts in audio	2002 007 Movie Trailer Commercial Bad Video	Internet Archive
Ideal audio (one speaker)	8 Bit Bookclub	Internet Archive

How to Run Whisper Benchmarks

1. Download Video from the Internet Archive

Visit the links provided and download the relevant item listed under MPEG or H264 in the download options. MP3 will also work.

2. Visit Whisper WebUI Repository

https://gitlab.com/aadnk/whisper-webui

Run whisper in docker or natively by following the provided instructions, make sure you set the appropriate options for the test you want to do (ie. omit "--gpus=all" if you would like to run a CPU benchmark)

Note: Occasionally the docker image provided does not work, if you run into any errors then try the gitlab registry. Also, the docker method is generally recommended as it's a lot easier to get up and running.

Results

Links are embeded for each category

CPU Benchmarks

CPU Model	Poor mic placement (s)	Thick accents (s)	Low-quality audio (s)	Artifacts in audio (s)	Ideal audio (one speaker)	(Docker/Native)

GPU Benchmarks

GPU Model	Poor mic placement (s)	Thick accents (s)	Low-quality audio (s)	Artifacts in audio (s)	Ideal audio (one speaker)	(Docker/Native)

Todo:

Write easy bash scripts for running a set of benchmarks with an easy cleanup
Finalize a standard format for exporting the data into a spreadsheet