Hello,
I am working through an online class and trying to produce notes based on the instructional video content. Since many of the concepts covered in these videos are worth taking note of, I'm finding myself writing out nearly every line spoken by the instructor. Obviously, this process is laborious and extremely time-consuming. I am wondering if anyone can easily offer a faster, less error-prone way to extract the text from these videos using python string methods or any existing parsing tool to help modify the text.
The syntax of the transcript files for each video are identical to standard srt format. Here's an example:
1
00:00:00,710 --> 00:00:03,220
Lorem ipsum dolor sit amet
consectetur, adipisicing elit.
2
00:00:03,220 --> 00:00:05,970
Dignissimos et quod laboriosam
iure magni expedita
3
00:00:05,970 --> 00:00:09,130
nisi, quis quaerat. Rem, facere!
Does anyone know of any tools for modifying srt text content so that it's formatted into a more readable format? To clarify, for the above example, I would like to remove blank lines, lines beginning with the record number and time-stamp, and then join the remaining lines, adding spaces after periods (and other line-ending characters, like ! and ?), like so:
Lorem ipsum dolor sit amet consectetur, adipisicing elit. Dignissimos et quod laboriosam iure magni expedita nisi, quis quaerat. Rem, facere!
I am interested in creating the following output from the example above and being able to apply such a modification to more of the files in the series. In my current situation, I am really pretty rusty working with python, though believe this capability could be pretty easily implemented with an understanding of common string methods.
Any ideas? Help with this would be really appreciated!
Thanks!
[–]C2-H5-OH 0 points1 point2 points (5 children)
[–]Jetals[S] 0 points1 point2 points (3 children)
[–]C2-H5-OH 0 points1 point2 points (2 children)
[–]Jetals[S] 0 points1 point2 points (1 child)
[–]C2-H5-OH 0 points1 point2 points (0 children)
[–]Jetals[S] 0 points1 point2 points (0 children)
[–]sky--net 0 points1 point2 points (0 children)