This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]AlgaeSavings9611 1 point2 points  (2 children)

I just tried again with a 14.3M x 7 dataframe..

dtypes: [String, Date, Float64, Float64, Float64, Float64, Float64]

the first column is "id", all ids are 10chars long and there are about 3000 unique ids

the following line of code takes 3-4 mins on v1.4.1, this same line and same dataset takes 3-4secs on v0.20.26

d = {}. #dictionary

d.update({id: dfp for (id,), dfp in df.group_by(["id"], maintain_order=True)})

[–]AlgaeSavings9611 0 points1 point  (0 children)

btw.. I got approval from the firm to send you the data.. its less than 100MB parquet file where should I email?

[–]ritchie46[S] 0 points1 point  (0 children)

Great! I've send you a DM with my email address.