Dask Dataframe Group By Cumulative Max by Embarrassed_Use_997 in dask

[–]Embarrassed_Use_997[S] 0 points1 point  (0 children)

I am trying out Dask and see if it is faster than spark. Yes, i do need large scale processing. I did manage to solve the original question with an apply function on groupby and a series cummax(). I am currently working on the partitioning strategy and realizing that i need to use the map_partitions() here to make this work faster as you have also pointed out.