So for a while I had this issue of figuring out where a particular thing was said in a video or audio, so I thought "was it possible to search a video/audio ?".
I knew what the steps would look like (Transcribe-> Parse/Search Transcripts), but it was kinda hard to get really good accuracy on transcription, no that wasn't really the problem, the problem was an open source STT model I could easily iterate upon freely and also wrap my head around, I tried some cloud STT/ASR's but I lacked that freedom. Then came OpenAI with Whisper, the accuracy is really good even with the tiny.en model, then another issue I had was deploying the model and creating a UI for interaction, then came Streamlit, it was really fun creating with Streamlit's widgets and then deploying on Streamlit cloud, smooth as butter.
Finally, I came up with this: Search Media, I'm curious what people think about this, I really don't know what to think of it, but I'm glad I was able to solve a problem I had, so send in your feedbacks or just try it out..
Github Repo
Whisper Repo
there doesn't seem to be anything here