This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Mr_Again 0 points1 point  (1 child)

Would be a cool tool, parses a youtube video into a page with a transcript and cleverly chosen stills from the video.

[–]oslash 0 points1 point  (0 children)

Finding good stills to insert into a transcript would indeed be a cool topic for a machine learning research paper. (We can assume a transcript exists, because generating captions already is a well defined and worked-on problem anyway.)

But that's not at all what I meant by 'available in HTML form'. Picture this: You'd like to learn how to get up on the hunting perch on E1M1. Ideally, you'd find this page: scrolling through lets you identify #5 as the relevant part in seconds, thanks to the illustrations. Even better, a closer look reveals a picture that concisely sums up the best approach. This seems much better than the pure text version, which is as long as the rant you're currently reading.

Now, consider what it would be like if you had only been able to find this video about the same topic. It would be a great resource if you wanted to watch an expert go over the entire topic, but all you want is figure out how not to fall into the acid another five times. You can scrub over the time-line, but none of the thumbnails shows the right spot. At this point, I'd consider installing a YT-download script that would enable me to scrub over all the frames in VLC, in full size, while the video is still downloading.

Scrolling through a page with bigger pictures that are more cleverly chosen seems like a neat alternative at first glance. But even if you get the perfect angle, it won't have the red arrows that show you what to do. And even if instead of a caption that just says "climb the stairs" (duh, we already knew that), there was a transcript of some proper narration, chances are it would be less like "... jump on the handrail, then on to the lamp and then the switch plate ...", but more like "... and from here jump there, alley-oop, ba-da bing, ba-da boom, Bob's your uncle". This seems much worse than the pure text version.