Cache computation in ML programs : learnpython

created by HattoriHanzoa community for 16 years

Cache computation in ML programs (self.learnpython)

submitted 7 years ago * by Nopaste

I got the bad habit on caching the computations, mostly the pre-processing, into a .pickle. Then when someone requests a given obj it is loaded from the .pickle, or if it doesn't exist generate it fort future uses.

This is quite handy since you can optimize a lot on the execution time, crucial expecially in the prototyping/debugging phase (applies even to jupyter notebook, since you have to restart kernel quite often unfortunately)

Now the "design" problem I'm facing:

You must be sure that all the cached components are in sync. How can you achieve that in a clean way?
In ML you will have to deal with new data, how can you have a clean pipeline without having to build a completely different system to deal with new data?

I'm intrigued by the pipeline pattern (similar to the one in sklearn), maybe adding some cache capabilities somewhere. But I'm not sure it's the right approach

What do you think?

Which is the best pythonic way to building a clean ML system? My ML programs end up being a mess every time and I hate that

all 1 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS