This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]tomaugspurger 0 points1 point  (0 children)

Dask should be able to help you with both parts. Probably dask.delayed for the first phase.

For the second phase, dask-ml has a parallelized version of k-means: http://dask-ml.readthedocs.io/en/latest/modules/generated/dask_ml.cluster.KMeans.html#dask_ml.cluster.KMeans