whisperbenchmarks/README.md

# WhisperBenchmarks

This repository provides easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings.

Based on https://gitlab.com/aadnk/whisper-webui

## Models

| Model | Command |
|-|-|
|faster-large-v3| --whisper_implementation faster-whisper --model large-v3 |
|faster-medium| --whisper_implementation faster-whisper --model medium |
|faster-small| --whisper_implementation faster-whisper --model small |
|faster-tiny| --whisper_implementation faster-whisper --model tiny |

## Videos

Videos are chosen for being short and matching their given category

| Categories | Title | Link | Length | Instant Download |
|-|-|-|-|-|
| Poor mic placement | Body camera footage from July 10 traffic stop | [Internet Archive](https://archive.org/details/cobmn-Body_camera_footage_from_July_10_traffic_stop) | 2:22 | [MP4](https://archive.org/download/cobmn-Body_camera_footage_from_July_10_traffic_stop/Body_camera_footage_from_July_10_traffic_stop.mp4) |
| Thick accents | Moonshine for Medicine Popcorn Sutton | [Internet Archive](https://archive.org/details/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/+Moonshine+for+Medicine++++Popcorn+Sutton.mp4) | 1:35 | [MP4](https://archive.org/download/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/%20Moonshine%20for%20Medicine%20%20%20%20Popcorn%20Sutton.mp4) |
| Artifacts in audio | 2002 007 Movie Trailer Commercial Bad Video | [Internet Archive](https://archive.org/details/2002variouscommercials/2002+007+Movie+Trailer+Commercial+Bad+Video.mp4) | 0:14 | [MP4](https://archive.org/download/2002variouscommercials/2002%20A%20Touch%20Of%20Class%20Limos%20Bridal%20Show%20Wilton%20Mall%20Saratoga%20Commercial.mp4) |
| Ideal audio (one speaker) | 8 Bit Bookclub | [Internet Archive](https://archive.org/details/8-bit-bookclub/36+-+ANNOUNCEMENT++SUMMER+HIATUS.mp3) | 1:44 | [MP3](https://archive.org/download/8-bit-bookclub/36%20-%20ANNOUNCEMENT%20%20SUMMER%20HIATUS.mp3) |
| Long form (many speakers) | Bionic Woman "Black Magic" (1976) | [Internet Archive](https://archive.org/details/bionic-woman-black-magic) | 43:53 | [MP4](https://archive.org/download/bionic-woman-black-magic/Black%20Magic-NA_x264.mp4) |

## How to Run Whisper Benchmarks

-- TODO --

## Results

Results are for the complete run which includes loading the model, running VAD, and running the transcription. Links are embeded in the results for each category

### CPU Benchmarks

|  CPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | Long form | (Docker/Native) | Model |
|-|-|-|-|-|-|-|-|

### GPU Benchmarks

|  GPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | Long form (m:s:ms) | (Docker/Native) | Model |
|-|-|-|-|-|-|-|-|
| RTX 2060S | [00:02.14](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/Body_camera_footage_from_July_10_traffic_stop.mp4-subs.srt) | [00:09.99](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/Moonshine%20for%20Medicine%20Popcorn%20Sutton.mp4-subs.srt) | [00:05.07](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/2002%20007%20Movie%20Trailer%20Commercial%20Bad%20Video.mp4-subs.srt) | [00:11.02](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/36%20-%20ANNOUNCEMENT%20SUMMER%20HIATUS.mp3-subs.srt) | | Native | Faster-Medium |

## Todo:
- [ ] Write easy bash scripts for running a set of benchmarks with an easy cleanup
- [ ] Finalize a standard format for exporting the data into a spreadsheet
Update README.md 2023-12-02 03:29:27 +00:00			`# WhisperBenchmarks`
Initial commit 2023-12-02 03:16:17 +00:00
Update README.md 2023-12-02 03:29:27 +00:00			`This repository provides easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings.`

Update README.md 2023-12-02 22:16:06 +00:00			`Based on https://gitlab.com/aadnk/whisper-webui`

			`## Models`

			`\| Model \| Command \|`
			`\|-\|-\|`
			`\|faster-large-v3\| --whisper_implementation faster-whisper --model large-v3 \|`
			`\|faster-medium\| --whisper_implementation faster-whisper --model medium \|`
			`\|faster-small\| --whisper_implementation faster-whisper --model small \|`
			`\|faster-tiny\| --whisper_implementation faster-whisper --model tiny \|`

Update README.md 2023-12-02 23:24:39 +00:00			`## Videos`
Update README.md 2023-12-02 03:29:27 +00:00
Update README.md 2023-12-02 22:16:06 +00:00			`Videos are chosen for being short and matching their given category`

update 2023-12-02 23:21:58 -05:00			`\| Categories \| Title \| Link \| Length \| Instant Download \|`
Update README.md 2023-12-02 22:16:06 +00:00			`\|-\|-\|-\|-\|-\|`
update 2023-12-02 23:21:58 -05:00			`\| Poor mic placement \| Body camera footage from July 10 traffic stop \| [Internet Archive](https://archive.org/details/cobmn-Body_camera_footage_from_July_10_traffic_stop) \| 2:22 \| [MP4](https://archive.org/download/cobmn-Body_camera_footage_from_July_10_traffic_stop/Body_camera_footage_from_July_10_traffic_stop.mp4) \|`
			`\| Thick accents \| Moonshine for Medicine Popcorn Sutton \| [Internet Archive](https://archive.org/details/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/+Moonshine+for+Medicine++++Popcorn+Sutton.mp4) \| 1:35 \| [MP4](https://archive.org/download/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/%20Moonshine%20for%20Medicine%20%20%20%20Popcorn%20Sutton.mp4) \|`
			`\| Artifacts in audio \| 2002 007 Movie Trailer Commercial Bad Video \| [Internet Archive](https://archive.org/details/2002variouscommercials/2002+007+Movie+Trailer+Commercial+Bad+Video.mp4) \| 0:14 \| [MP4](https://archive.org/download/2002variouscommercials/2002%20A%20Touch%20Of%20Class%20Limos%20Bridal%20Show%20Wilton%20Mall%20Saratoga%20Commercial.mp4) \|`
			`\| Ideal audio (one speaker) \| 8 Bit Bookclub \| [Internet Archive](https://archive.org/details/8-bit-bookclub/36+-+ANNOUNCEMENT++SUMMER+HIATUS.mp3) \| 1:44 \| [MP3](https://archive.org/download/8-bit-bookclub/36%20-%20ANNOUNCEMENT%20%20SUMMER%20HIATUS.mp3) \|`
			`\| Long form (many speakers) \| Bionic Woman "Black Magic" (1976) \| [Internet Archive](https://archive.org/details/bionic-woman-black-magic) \| 43:53 \| [MP4](https://archive.org/download/bionic-woman-black-magic/Black%20Magic-NA_x264.mp4) \|`
Update README.md Add a how-to section and remove example data 2023-12-02 04:06:55 +00:00
			`## How to Run Whisper Benchmarks`

Update README.md 2023-12-02 04:31:39 +00:00			`-- TODO --`
Update README.md 2023-12-02 03:29:27 +00:00
			`## Results`

Update README.md 2023-12-02 23:29:56 +00:00			`Results are for the complete run which includes loading the model, running VAD, and running the transcription. Links are embeded in the results for each category`
Update README.md Add a how-to section and remove example data 2023-12-02 04:06:55 +00:00
Update README.md 2023-12-02 03:29:27 +00:00			`### CPU Benchmarks`

update 2023-12-02 23:21:58 -05:00			`\| CPU Model \| Poor mic placement (m:s:ms) \| Thick accents (m:s:ms) \| Artifacts in audio (m:s:ms) \| Ideal audio (m:s:ms) \| Long form \| (Docker/Native) \| Model \|`
			`\|-\|-\|-\|-\|-\|-\|-\|-\|`
Update README.md 2023-12-02 03:29:27 +00:00
			`### GPU Benchmarks`

update 2023-12-02 23:21:58 -05:00			`\| GPU Model \| Poor mic placement (m:s:ms) \| Thick accents (m:s:ms) \| Artifacts in audio (m:s:ms) \| Ideal audio (m:s:ms) \| Long form (m:s:ms) \| (Docker/Native) \| Model \|`
			`\|-\|-\|-\|-\|-\|-\|-\|-\|`
			\| RTX 2060S \| [00:02.14](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/Body_camera_footage_from_July_10_traffic_stop.mp4-subs.srt) \| [00:09.99](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/Moonshine%20for%20Medicine%20Popcorn%20Sutton.mp4-subs.srt) \| [00:05.07](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/2002%20007%20Movie%20Trailer%20Commercial%20Bad%20Video.mp4-subs.srt) \| [00:11.02](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/36%20-%20ANNOUNCEMENT%20SUMMER%20HIATUS.mp3-subs.srt) \| \| Native \| Faster-Medium \|
Update README.md 2023-12-02 04:31:39 +00:00
Update README.md 2023-12-02 03:29:27 +00:00			`## Todo:`
Update README.md Add a how-to section and remove example data 2023-12-02 04:06:55 +00:00			`- [ ] Write easy bash scripts for running a set of benchmarks with an easy cleanup`
			`- [ ] Finalize a standard format for exporting the data into a spreadsheet`