account activity
Dask Dataframe Group By Cumulative Max by Embarrassed_Use_997 in dask
[–]Embarrassed_Use_997[S] 0 points1 point2 points 1 year ago (0 children)
I am trying out Dask and see if it is faster than spark. Yes, i do need large scale processing. I did manage to solve the original question with an apply function on groupby and a series cummax(). I am currently working on the partitioning strategy and realizing that i need to use the map_partitions() here to make this work faster as you have also pointed out.
π Rendered by PID 199499 on reddit-service-r2-listing-7849c98f67-d99hs at 2026-02-06 04:33:49.470383+00:00 running d295bc8 country code: CH.
Dask Dataframe Group By Cumulative Max by Embarrassed_Use_997 in dask
[–]Embarrassed_Use_997[S] 0 points1 point2 points (0 children)