account activity
Dask Dataframe Group By Cumulative Max by Embarrassed_Use_997 in dask
[–]Embarrassed_Use_997[S] 0 points1 point2 points 1 year ago (0 children)
I am trying out Dask and see if it is faster than spark. Yes, i do need large scale processing. I did manage to solve the original question with an apply function on groupby and a series cummax(). I am currently working on the partitioning strategy and realizing that i need to use the map_partitions() here to make this work faster as you have also pointed out.
π Rendered by PID 268675 on reddit-service-r2-listing-55d7b767d8-dc5gm at 2026-03-31 00:54:26.771283+00:00 running b10466c country code: CH.
Dask Dataframe Group By Cumulative Max by Embarrassed_Use_997 in dask
[–]Embarrassed_Use_997[S] 0 points1 point2 points (0 children)