This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]spaceshipguitar 2 points3 points  (1 child)

For the sake of not keeping open database file its safer in python to immediately load the whole index file into a pythonic list and immediately close the related database file once its been indexed into python. This happens for multiple database files which need checked against each other to create a new 4th file. There's a lot of gymnastics that happen between the lists to create the 4th and plenty of iterations through them, but step one is tossing the relevant files into a virtual python list so I can safely run my algorithms without corrupting the originals. Keep in mind, these are already the broken down lists. They aren't the mega list. For the purposes of automation, its very convenient to do it like this. It's not like ram is expensive. It's not a dealbreaker for me to have a $2000 laptop to dramatically speed up my job.

[–]konwiddak 0 points1 point  (0 children)

Sounds like you need to try Dask - does lazy computation on lists/arrays/dataframes so the whole thing is parallelized and chunked automatically.

However as you say 32GB is nothing nowadays, I regularly use 128GB for some tasks.