<code>whisper-tui</code>: Interactively Grepping Audio with Whisper and SQLite


Another tool I impulsively made to solve an issue I had.

In a Discord server I’m a part of, a fellow member asked me to share a link to a podcast episode I mentioned when discussing how physicists are awesome. I wanted to share with him the timestamp where the relevant discussion happened since the podcast is pretty long and I didn’t expect him to watch the entire thing. The experience of deriving the timestamp ws pretty awful; I awkwardly scrubbed through the player as I fished for the specific section I was looking for. It took me like five minutes to do this, which was pretty embarassing. [1]

I realized in that moment that just being able to search the captions of a video would have been a lot nicer as text is the universal interface. I have some previous experience in this domain, but it is much less ad-hoc; I have no need to keep around the captions once I have the snippets I need, and don’t mind having to generate the captions again since Whisper on the GPU runs fast enough. [2]

So, I made a tool which does just that. It’s a 50-line Python script which takes an audio file, transcribes it [3], inserts the individual lines of the transcription into an in-memory SQLite FTS5 virtual table, and then drops the user into a REPL where they can search for captions in that table.

For being so scrappy, the script achieves its goal pretty well. As always, SQLite rocks; there’s basically no other library for Python which allows for the in-memory document search [4], and it’s part of the standard library. Whisper is quite fast with a GPU; it can chew through the podcast episode I mentioned earlier in two-and-a-half minutes.

Overall, I’m pretty happy with the project; if my description of the tool interests you, you can clone it. Making productivity tools like this has been the driving force behind why I write software; extending myself with my machine always brings me joy.

1. I could have used the autogenerated captions on the YouTube copy of the podcast to find the relevant section, but I hadn’t thought of it at the time.
2. I wanted to go even faster with faster-whisper, but had issues getting it to work with my graphics card’s version of CUDA. I love the Python ML ecosystem. :)
3. It uses the tiny.en model by default for maximum performance; it’s good enough.
4. The only one I know of is this embedding of Meilisearch’s indexer; I wanted a fuzzy search solution but couldn’t find an all-in-one package.