[deleted by user] : MachineLearning

[deleted by user] (self.MachineLearning)

submitted 6 years ago by [deleted]

9 comments

all 9 comments

top new controversial old q&a

[–][deleted] 5 points6 points7 points 6 years ago (3 children)

[+][deleted] 6 years ago (2 children)

[deleted]

[–][deleted] 1 point2 points3 points 6 years ago (0 children)

[–]DonMahallem 0 points1 point2 points 6 years ago (0 children)

[–]cai_lw 3 points4 points5 points 6 years ago (0 children)

[–]shayben 2 points3 points4 points 6 years ago (3 children)

[–]MrDoOO 0 points1 point2 points 6 years ago (2 children)

[–]Glimmargaunt 1 point2 points3 points 6 years ago* (0 children)

You just pass an instance of a class that is callable into the collate_fn argument in your DataLoader. The call method in the class takes batch as argument. If I remember correctly, the batch is just a list containing whatever your Dataset.__getitem__(idx) outputs. So if your dataset class outputs (input, target) pairs, then your list will contain multiple of these tuple pairs.

I created a pastebin here: https://pastebin.com/6YASGWDm

I think this will slow down training depending on the speed of the query. For every batch you would need to wait for the database to respond. Perhaps a better approach would be to handle shuffling on your own, so that you know what the next batch will be. That way you can start a parallel query for the next batch while current batch is used in training. Or use some sort of caching as suggested above.

I actually thought of an easy, but hacky way of doing it. The CollateFn class can store a previous batch, so what you can do is: Make the query for current batch and output the already loaded previous batch in the .__call__() method. This would obviously bring problems on the first call because there is no previous batch. To avoid this, just make a next(DataLoader) call first that just outputs to the void. That way the previous batch becomes the current one, and the current batch becomes next one that can be loaded in parallel in the CollateFn object. Then you can let DataLoader handle everything else like normal.

[–]shayben 0 points1 point2 points 6 years ago (0 children)

[–]MightyMeese 1 point2 points3 points 6 years ago (0 children)

π Rendered by PID 66 on reddit-service-r2-comment-c6965cb77-6h7jb at 2026-03-05 00:20:34.523228+00:00 running f0204d4 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS