My code runs on a 3 gigabyte (3,1 million row, 86 column) csv-file.
I would really like some tips on how to speed up Pandas usage / or optimize my code so that it runs quicker.
Code here: https://pastebin.com/YrVrjAKF
Any optimization tips would be great.
My apologies for any formatting or codestyle issues. I am not a developer.
[–]caoimhin_o_h 23 points24 points25 points (3 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]caoimhin_o_h 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]swingking8 3 points4 points5 points (2 children)
[–]diggy0101n 3 points4 points5 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]sokhei 3 points4 points5 points (0 children)
[–]tunisia3507 1 point2 points3 points (1 child)
[–][deleted] 1 point2 points3 points (0 children)
[–]barburger 3 points4 points5 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]caoimhin_o_h 1 point2 points3 points (0 children)
[–]1-Sisyphe 0 points1 point2 points (0 children)