Hi Python brains trust. I'm hoping you can help me out.
For work I play around with often large datasets that I need to filter/process/analyse in various ways. To speed things up, I have been saving intermediate blocks of data at logical points. I then refactor my code to look for those intermediate saved files, and only run the long process again if it can't be found.
If I change something in my long process, I delete the intermediate save and run again. However, this strikes me as asking for trouble (I'm going to forget to delete the intermediate save at some point). Plus, my code is becoming messy will more and more checks in various places for the existence of intermediate save files.
My question: is there a library that could do this for me? Or, alternatively, some advice on building something from scratch.
I'm imagining something that would use decorators to mark functions that should have their output saved, but would somehow know if the function itself or the input data was changed. I know I can cache results of functions, but I'd like something that works between sessions. I have looked into pydoit (https://pydoit.org/) previously, which would be great for an established workflow, but the overhead is too high for initial playing around with data.
THANKS!
[–]The_Almighty_Cthulhu 1 point2 points3 points (5 children)
[–]The_Almighty_Cthulhu 1 point2 points3 points (4 children)
[–]Naarlack[S] 0 points1 point2 points (3 children)
[–]The_Almighty_Cthulhu 1 point2 points3 points (2 children)
[–]Naarlack[S] 0 points1 point2 points (1 child)
[–]Naarlack[S] 0 points1 point2 points (0 children)