From c5e92bf8bb76731f6ac1e3f4221257b4e6a310fd Mon Sep 17 00:00:00 2001 From: brooke Date: Sat, 2 Dec 2023 22:16:06 +0000 Subject: [PATCH] Update README.md --- README.md | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index ed0261e..086a3fa 100644 --- a/README.md +++ b/README.md @@ -2,14 +2,27 @@ This repository provides easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings. +Based on https://gitlab.com/aadnk/whisper-webui + +## Models + +| Model | Command | +|-|-| +|faster-large-v3| --whisper_implementation faster-whisper --model large-v3 | +|faster-medium| --whisper_implementation faster-whisper --model medium | +|faster-small| --whisper_implementation faster-whisper --model small | +|faster-tiny| --whisper_implementation faster-whisper --model tiny | + ## Links -| Categories | Title | Links | -|-----------------------|-----------------------|-------------------------------------------------------------------------------------------------------| -| Poor mic placement | Body camera footage from July 10 traffic stop | [Internet Archive](https://archive.org/details/cobmn-Body_camera_footage_from_July_10_traffic_stop) | -| Thick accents | Moonshine for Medicine Popcorn Sutton | [Internet Archive](https://archive.org/details/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/+Moonshine+for+Medicine++++Popcorn+Sutton.mp4) | -| Artifacts in audio | 2002 007 Movie Trailer Commercial Bad Video | [Internet Archive](https://archive.org/details/2002variouscommercials/2002+007+Movie+Trailer+Commercial+Bad+Video.mp4) | -| Ideal audio (one speaker) | 8 Bit Bookclub | [Internet Archive](https://archive.org/details/8-bit-bookclub/36+-+ANNOUNCEMENT++SUMMER+HIATUS.mp3) | +Videos are chosen for being short and matching their given category + +| Categories | Title | Links | Length | Type | +|-|-|-|-|-| +| Poor mic placement | Body camera footage from July 10 traffic stop | [Internet Archive](https://archive.org/details/cobmn-Body_camera_footage_from_July_10_traffic_stop) | 2:22 | MP4 | +| Thick accents | Moonshine for Medicine Popcorn Sutton | [Internet Archive](https://archive.org/details/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/+Moonshine+for+Medicine++++Popcorn+Sutton.mp4) | 1:35 | MP4 | +| Artifacts in audio | 2002 007 Movie Trailer Commercial Bad Video | [Internet Archive](https://archive.org/details/2002variouscommercials/2002+007+Movie+Trailer+Commercial+Bad+Video.mp4) | 0:14 | MP4 | +| Ideal audio (one speaker) | 8 Bit Bookclub | [Internet Archive](https://archive.org/details/8-bit-bookclub/36+-+ANNOUNCEMENT++SUMMER+HIATUS.mp3) | 1:44 | MP3 | ## How to Run Whisper Benchmarks @@ -17,17 +30,18 @@ This repository provides easy-to-use benchmarks using audio and video content fr ## Results -Links are embeded for each category +Results are for the complete run which includes loading the model, running VAD, and running the transcription. Links are embeded for each category ### CPU Benchmarks -| CPU Model | Poor mic placement (s) | Thick accents (s) | Artifacts in audio (s) | Ideal audio (one speaker) | (Docker/Native) | -|-|-|-|-|-|-| +| CPU Model | Poor mic placement (s) | Thick accents (s) | Artifacts in audio (s) | Ideal audio (s) | (Docker/Native) | Model | +|-|-|-|-|-|-|-| ### GPU Benchmarks -| GPU Model | Poor mic placement (s) | Thick accents (s) | Artifacts in audio (s) | Ideal audio (one speaker) | (Docker/Native) | -|-|-|-|-|-|-| +| GPU Model | Poor mic placement (s) | Thick accents (s) | Artifacts in audio (s) | Ideal audio (s) | (Docker/Native) | Model | +|-|-|-|-|-|-|-| +| RTX 2060S | 2.14 | 9.99 | 5.07 | 11.02 | Native | Faster-Medium | ## Example