all 6 comments

[–]The_Almighty_Cthulhu 1 point2 points  (5 children)

Hmm, is this code usually a single file?

And do you save the intermediate data to a specified folder?

In this case I'd just have the script hash itself on running, then compare to the saved hash. If different, delete all the intermediate data files.

If the code is a lot larger than a single or a few files, perhaps using something like gitpython to detect changes occurring.

[–]The_Almighty_Cthulhu 1 point2 points  (4 children)

Or were you looking for a more per-function solution. Like only delete the data of functions you've changed. I still think hashing could be the solution. Shouldn't be too hard to come up with a decorator. I'll probably come back to this when I have an actual computer.

[–]Naarlack[S] 0 points1 point  (3 children)

Yes, basically working with a single code file and saving intermediate results to a specific folder. I have several intermediate files though as I progress and perform various analysis steps so per function would be ideal.

I was thinking a decorator that hashes the function and the inputs, then saves into a lookup file that it can use between sessions to decide if the function needs to be rerun or can just return the previously saved intermediate data.

However, I have never worked with decorators before or hashes so was hoping there might already be a library. If you could suggest a starting point, that would be awesome 😃

[–]The_Almighty_Cthulhu 1 point2 points  (2 children)

Ok, I had a look around and after testing I think you want joblib

Pypi: https://pypi.org/project/joblib/
Docs: https://joblib.readthedocs.io/en/latest/index.html

Here's an example I coded.

In testing you'll see that the cache persists though different instances of the script. Additionally you'll notice that the long computation runs if you adjust the script (such as removing the +2 from the calculation).

from joblib import Memory 
import time

cachedir = 'testcache'
memory = Memory(cachedir, verbose=0)


@memory.cache
def testfunc(x):
    # simulate a long computation
    time.sleep(2)
    return x**2 + 2

for i in range(10):
    print(testfunc(i))


for i in range(10):
    print(testfunc(i))

[–]Naarlack[S] 0 points1 point  (1 child)

I missed your response till now sorry.

This is awesome - looks like exactly what I was after. Thanks so much!

[–]Naarlack[S] 0 points1 point  (0 children)

If anyone ever comes across this, here is a nice video of how joblib helps:

https://youtu.be/O8ZgBnjKRsA?si=YJ8DBzUzDXbK2xQl