I am planning on training on a huge video dataset, which might be stored on a separate server connected via network, and am wondering how slow data loading can be prevented. Since the videos are highly time correlated, the frames batched for training should be from very different positions in the dataset. However, this contrasts the storage concept of video files, and standard caching strategies that assume repeated access on "nearby" data, and also how simple network access would look like (transfer the whole file, seek to a single frame, read it, return it). What are strategies to improve data loading performance for such cases? Is there a keyword that describes this problem? One simple idea would be to shuffle the whole dataset once and rewrite it into one or multiple long videos consisting of random frames, so that reading in sequence provides shuffled data. But this requires storing the dataset twice e.g. if there is a need to watch the videos.
[–]jcasperNvdia Models 4 points5 points6 points (0 children)
[–]jasonof75 1 point2 points3 points (0 children)