Update README.md
This commit is contained in:
parent
327d3ea238
commit
c5e92bf8bb
1 changed files with 25 additions and 11 deletions
36
README.md
36
README.md
|
@ -2,14 +2,27 @@
|
||||||
|
|
||||||
This repository provides easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings.
|
This repository provides easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings.
|
||||||
|
|
||||||
|
Based on https://gitlab.com/aadnk/whisper-webui
|
||||||
|
|
||||||
|
## Models
|
||||||
|
|
||||||
|
| Model | Command |
|
||||||
|
|-|-|
|
||||||
|
|faster-large-v3| --whisper_implementation faster-whisper --model large-v3 |
|
||||||
|
|faster-medium| --whisper_implementation faster-whisper --model medium |
|
||||||
|
|faster-small| --whisper_implementation faster-whisper --model small |
|
||||||
|
|faster-tiny| --whisper_implementation faster-whisper --model tiny |
|
||||||
|
|
||||||
## Links
|
## Links
|
||||||
|
|
||||||
| Categories | Title | Links |
|
Videos are chosen for being short and matching their given category
|
||||||
|-----------------------|-----------------------|-------------------------------------------------------------------------------------------------------|
|
|
||||||
| Poor mic placement | Body camera footage from July 10 traffic stop | [Internet Archive](https://archive.org/details/cobmn-Body_camera_footage_from_July_10_traffic_stop) |
|
| Categories | Title | Links | Length | Type |
|
||||||
| Thick accents | Moonshine for Medicine Popcorn Sutton | [Internet Archive](https://archive.org/details/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/+Moonshine+for+Medicine++++Popcorn+Sutton.mp4) |
|
|-|-|-|-|-|
|
||||||
| Artifacts in audio | 2002 007 Movie Trailer Commercial Bad Video | [Internet Archive](https://archive.org/details/2002variouscommercials/2002+007+Movie+Trailer+Commercial+Bad+Video.mp4) |
|
| Poor mic placement | Body camera footage from July 10 traffic stop | [Internet Archive](https://archive.org/details/cobmn-Body_camera_footage_from_July_10_traffic_stop) | 2:22 | MP4 |
|
||||||
| Ideal audio (one speaker) | 8 Bit Bookclub | [Internet Archive](https://archive.org/details/8-bit-bookclub/36+-+ANNOUNCEMENT++SUMMER+HIATUS.mp3) |
|
| Thick accents | Moonshine for Medicine Popcorn Sutton | [Internet Archive](https://archive.org/details/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/+Moonshine+for+Medicine++++Popcorn+Sutton.mp4) | 1:35 | MP4 |
|
||||||
|
| Artifacts in audio | 2002 007 Movie Trailer Commercial Bad Video | [Internet Archive](https://archive.org/details/2002variouscommercials/2002+007+Movie+Trailer+Commercial+Bad+Video.mp4) | 0:14 | MP4 |
|
||||||
|
| Ideal audio (one speaker) | 8 Bit Bookclub | [Internet Archive](https://archive.org/details/8-bit-bookclub/36+-+ANNOUNCEMENT++SUMMER+HIATUS.mp3) | 1:44 | MP3 |
|
||||||
|
|
||||||
## How to Run Whisper Benchmarks
|
## How to Run Whisper Benchmarks
|
||||||
|
|
||||||
|
@ -17,17 +30,18 @@ This repository provides easy-to-use benchmarks using audio and video content fr
|
||||||
|
|
||||||
## Results
|
## Results
|
||||||
|
|
||||||
Links are embeded for each category
|
Results are for the complete run which includes loading the model, running VAD, and running the transcription. Links are embeded for each category
|
||||||
|
|
||||||
### CPU Benchmarks
|
### CPU Benchmarks
|
||||||
|
|
||||||
| CPU Model | Poor mic placement (s) | Thick accents (s) | Artifacts in audio (s) | Ideal audio (one speaker) | (Docker/Native) |
|
| CPU Model | Poor mic placement (s) | Thick accents (s) | Artifacts in audio (s) | Ideal audio (s) | (Docker/Native) | Model |
|
||||||
|-|-|-|-|-|-|
|
|-|-|-|-|-|-|-|
|
||||||
|
|
||||||
### GPU Benchmarks
|
### GPU Benchmarks
|
||||||
|
|
||||||
| GPU Model | Poor mic placement (s) | Thick accents (s) | Artifacts in audio (s) | Ideal audio (one speaker) | (Docker/Native) |
|
| GPU Model | Poor mic placement (s) | Thick accents (s) | Artifacts in audio (s) | Ideal audio (s) | (Docker/Native) | Model |
|
||||||
|-|-|-|-|-|-|
|
|-|-|-|-|-|-|-|
|
||||||
|
| RTX 2060S | 2.14 | 9.99 | 5.07 | 11.02 | Native | Faster-Medium |
|
||||||
|
|
||||||
## Example
|
## Example
|
||||||
|
|
||||||
|
|
Loading…
Add table
Reference in a new issue