This repository provides easy-to-use benchmarks using audio and video content from the Internet Archive, specifically targeting various challenging scenarios in audio recordings.
| Poor mic placement | Body camera footage from July 10 traffic stop | [Internet Archive](https://archive.org/details/cobmn-Body_camera_footage_from_July_10_traffic_stop) | 2:22 | [MP4](https://archive.org/download/cobmn-Body_camera_footage_from_July_10_traffic_stop/Body_camera_footage_from_July_10_traffic_stop.mp4) |
| Thick accents | Moonshine for Medicine Popcorn Sutton | [Internet Archive](https://archive.org/details/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/+Moonshine+for+Medicine++++Popcorn+Sutton.mp4) | 1:35 | [MP4](https://archive.org/download/this-is-the-last-dam-run-of-likker-ill-ever-make-full-movie/%20Moonshine%20for%20Medicine%20%20%20%20Popcorn%20Sutton.mp4) |
| Artifacts in audio | 2002 007 Movie Trailer Commercial Bad Video | [Internet Archive](https://archive.org/details/2002variouscommercials/2002+007+Movie+Trailer+Commercial+Bad+Video.mp4) | 0:14 | [MP4](https://archive.org/download/2002variouscommercials/2002%20A%20Touch%20Of%20Class%20Limos%20Bridal%20Show%20Wilton%20Mall%20Saratoga%20Commercial.mp4) |
Results are for the complete run which includes loading the model, running VAD, and running the transcription. Links are embeded in the results for each category
| CPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | Long form | (Docker/Native) | Model |
| GPU Model | Poor mic placement (m:s:ms) | Thick accents (m:s:ms) | Artifacts in audio (m:s:ms) | Ideal audio (m:s:ms) | Long form (m:s:ms) | (Docker/Native) | Model |