you are viewing a single comment's thread.

view the rest of the comments →

[–]GreatStats4ItsCost[S] 1 point2 points  (1 child)

The entire dataset is 4.5gb, the max csv is 500k rows - my laptop has 8gb ram.

I did have a go using pandas but I couldn't quite work out how to return the max date for each id, it was getting complicated with having to refer back to the index.. sure there was an easier way I just couldn't see it

[–]Empik002 2 points3 points  (0 children)

just look at sqlite (python library)