diff --git a/README.md b/README.md index 3fe88b3..33211b0 100644 --- a/README.md +++ b/README.md @@ -17,12 +17,13 @@ Based on https://gitlab.com/aadnk/whisper-webui Videos are chosen for being short and matching their given category -| Categories | Title | Links | Length | Type | +| Categories | Title | Link | Length | Instant Download | |-|-|-|-|-| -| Poor mic placement | Body camera footage from July 10 traffic stop | [Internet Archive](https://archive.org/details/cobmn-Body_camera_footage_from_July_10_traffic_stop) | 2:22 | MP4 | -| Thick accents | Moonshine for Medicine Popcorn Sutton | [Internet Archive](https://archive.org/details/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/+Moonshine+for+Medicine++++Popcorn+Sutton.mp4) | 1:35 | MP4 | -| Artifacts in audio | 2002 007 Movie Trailer Commercial Bad Video | [Internet Archive](https://archive.org/details/2002variouscommercials/2002+007+Movie+Trailer+Commercial+Bad+Video.mp4) | 0:14 | MP4 | -| Ideal audio (one speaker) | 8 Bit Bookclub | [Internet Archive](https://archive.org/details/8-bit-bookclub/36+-+ANNOUNCEMENT++SUMMER+HIATUS.mp3) | 1:44 | MP3 | +| Poor mic placement | Body camera footage from July 10 traffic stop | [Internet Archive](https://archive.org/details/cobmn-Body_camera_footage_from_July_10_traffic_stop) | 2:22 | [MP4](https://archive.org/download/cobmn-Body_camera_footage_from_July_10_traffic_stop/Body_camera_footage_from_July_10_traffic_stop.mp4) | +| Thick accents | Moonshine for Medicine Popcorn Sutton | [Internet Archive](https://archive.org/details/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/+Moonshine+for+Medicine++++Popcorn+Sutton.mp4) | 1:35 | [MP4](https://archive.org/download/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/%20Moonshine%20for%20Medicine%20%20%20%20Popcorn%20Sutton.mp4) | +| Artifacts in audio | 2002 007 Movie Trailer Commercial Bad Video | [Internet Archive](https://archive.org/details/2002variouscommercials/2002+007+Movie+Trailer+Commercial+Bad+Video.mp4) | 0:14 | [MP4](https://archive.org/download/2002variouscommercials/2002%20A%20Touch%20Of%20Class%20Limos%20Bridal%20Show%20Wilton%20Mall%20Saratoga%20Commercial.mp4) | +| Ideal audio (one speaker) | 8 Bit Bookclub | [Internet Archive](https://archive.org/details/8-bit-bookclub/36+-+ANNOUNCEMENT++SUMMER+HIATUS.mp3) | 1:44 | [MP3](https://archive.org/download/8-bit-bookclub/36%20-%20ANNOUNCEMENT%20%20SUMMER%20HIATUS.mp3) | +| Long form (many speakers) | Bionic Woman "Black Magic" (1976) | [Internet Archive](https://archive.org/details/bionic-woman-black-magic) | 43:53 | [MP4](https://archive.org/download/bionic-woman-black-magic/Black%20Magic-NA_x264.mp4) | ## How to Run Whisper Benchmarks @@ -34,14 +35,14 @@ Results are for the complete run which includes loading the model, running VAD, ### CPU Benchmarks -| CPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | (Docker/Native) | Model | -|-|-|-|-|-|-|-| +| CPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | Long form | (Docker/Native) | Model | +|-|-|-|-|-|-|-|-| ### GPU Benchmarks -| GPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | (Docker/Native) | Model | -|-|-|-|-|-|-|-| -| RTX 2060S | [00:02.14](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/Body_camera_footage_from_July_10_traffic_stop.mp4-subs.srt) | [00:09.99](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/Moonshine%20for%20Medicine%20Popcorn%20Sutton.mp4-subs.srt) | [00:05.07](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/2002%20007%20Movie%20Trailer%20Commercial%20Bad%20Video.mp4-subs.srt) | [00:11.02](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/36%20-%20ANNOUNCEMENT%20SUMMER%20HIATUS.mp3-subs.srt) | Native | Faster-Medium | +| GPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | Long form (m:s:ms) | (Docker/Native) | Model | +|-|-|-|-|-|-|-|-| +| RTX 2060S | [00:02.14](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/Body_camera_footage_from_July_10_traffic_stop.mp4-subs.srt) | [00:09.99](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/Moonshine%20for%20Medicine%20Popcorn%20Sutton.mp4-subs.srt) | [00:05.07](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/2002%20007%20Movie%20Trailer%20Commercial%20Bad%20Video.mp4-subs.srt) | [00:11.02](https://git.myco.systems/brooke/whisperbenchmarks/src/branch/main/benchmark-outputs/36%20-%20ANNOUNCEMENT%20SUMMER%20HIATUS.mp3-subs.srt) | | Native | Faster-Medium | ## Todo: - [ ] Write easy bash scripts for running a set of benchmarks with an easy cleanup