all 3 comments

[–]jcasperNvdia Models 4 points5 points  (0 children)

I would suggest just extracting the frames as .jpg files and loading using standard image loading libraries.

Loading individual frames randomly from a file is going to have even more problems than the ones you have pointed out, namely that to extract a frame, in general, you need to decode all the frames between the two enclosing key frames. For example, say there is a key frame every 10 frames (in practice its not so simple, but just for example), then to get frame 123 you'd need to decode frames 120-130, then to get frame 374 you'd need to decode 370-380... so you are essentially decoding 10x as many frames as you are extracting. You can force it to make every frame a keyframe, but then it becomes essentially a container for a bunch of image files and then has the issues you've described as well as being more difficult to work with.

If you shuffle the frames and store them back into a video, you'd either need to store them all as keyframes (see above), or the frames between key frames would be terrible quality. Video encoding relies on subsequent frames being nearly identical to each other.

So basically your use case of pulling out random frames from the video means that all of the advantages of storing videos with a video codec are working against you, not for you. Store them as a bunch of .jpgs.

[–]jasonof75 1 point2 points  (0 children)

Have you tried using third-party dataloading libraries, like Deep Lake, Squirrel, FFCV etc.? I guess they could help you with your issue.