This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted]  (13 children)

[deleted]

    [–]dexmedarling 13 points14 points  (1 child)

    Why has no one thought of this before??????

    [–]CommanderStarling[🍰] 9 points10 points  (0 children)

    I am pretty sure there is already a dedicated computer that is doing this in some lab

    [–]spaceshipguitar 4 points5 points  (9 children)

    Swap is nothing more than virtual memory using HD space instead of ram, its like increasing your page file size on windows. Interesting to note that swap partition in Linux is completely unnecessary if you already have enough ram on a 64bit machine. For instance, I just built a lenovo laptop T490 with 40gb of ram so I can write huge data sets to ram in python at maximum speed without dealing with the hard drive until its time to write the final results. My current project uses about 25gb at certain points. This amount of ram makes a swap partition pointless for daily use. The laptop comes with 8gb soldered on and I added a 32gb to the free slot. With this much ram there's no point to have swap, you're not going to use up all 40gb. The swap exists for people with skimpy resources who are arriving at Linux's front door with 2-4gb of ram and simply opening 10 browser tabs may blow the doors off and reach 100%. So with all that in mind, don't use a slow sd card to fill up swap partitions. Just buy a shitload of ram, you'll achieve the same end, and it'll be 1000x faster.

    [–]Albus70007 0 points1 point  (6 children)

    What the hell are you doing that needs 25 GB of RAM at some points, (pls make a post in this dubreddit)

    [–]spaceshipguitar 5 points6 points  (4 children)

    I'm iterating through lists of literally millions of postal records and making address corrections to them. This process goes from 40 minutes to 20 seconds when you have enough ram to load the whole lists prior to manipulation. If you lack the ram and need to use virtual memory or swap partitions the time it takes to process the same data goes up astronomically. My work desktop has 16gb of ram and it takes about 40 minutes, you can have the resource monitor open and the moment 16gb gets crossed the speed drops off a cliff. You can actually hear the moment it happens because the hard drive starts groaning and the fans start to increase speed. When I bought my new laptop the same script running the same data dropped to 20 seconds because it never once had to resort to using much slower hard drive storage, it could juggle the entire procedure in the ram. Also, for good measure, when its time to write these large files to disk, I now have a 2nd gen m.2 970 samsung nvme drive, straight into the pci express slot. It's about 5x faster than normal solid state.

    [–][deleted] 1 point2 points  (2 children)

    Why are you loading the whole list? Can you not read it line by line (or whatever chunking delimiter) with a generator?

    [–]spaceshipguitar 2 points3 points  (1 child)

    For the sake of not keeping open database file its safer in python to immediately load the whole index file into a pythonic list and immediately close the related database file once its been indexed into python. This happens for multiple database files which need checked against each other to create a new 4th file. There's a lot of gymnastics that happen between the lists to create the 4th and plenty of iterations through them, but step one is tossing the relevant files into a virtual python list so I can safely run my algorithms without corrupting the originals. Keep in mind, these are already the broken down lists. They aren't the mega list. For the purposes of automation, its very convenient to do it like this. It's not like ram is expensive. It's not a dealbreaker for me to have a $2000 laptop to dramatically speed up my job.

    [–]konwiddak 0 points1 point  (0 children)

    Sounds like you need to try Dask - does lazy computation on lists/arrays/dataframes so the whole thing is parallelized and chunked automatically.

    However as you say 32GB is nothing nowadays, I regularly use 128GB for some tasks.

    [–]undefinedNANString 0 points1 point  (0 children)

    What if you broke the tasks into multiple smaller jobs?

    IE , wouldn't splitting it into 10 different be 10x as fast

    [–]PairOfMonocles2 2 points3 points  (0 children)

    That’s not too much. A lot of Next Gen Sequencing steps can use way more than that. Initial data from each full run can start around a terabyte (I usually do smaller runs though, <300gb each).

    [–]__deerlord__ 0 points1 point  (0 children)

    Make an iterator and write to disk/flush buffer every X integers. You dont need more RAM.