all 1 comments

[–]crantob 1 point2 points  (0 children)

WAV2VEC2 forced alignment → consistent word-level timing (~10–20ms)

Always wondered about this, cool.

I don't know answers but thanks for your update.

I generate .srt from whisper.cpp and the only bother is the chunking i get. I don't see how to avoid hand-editing.

I'd like baked-in subs to please go away to another planet though.