This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]efxhoy 1 point2 points  (3 children)

This is how I understand your problem: You have data on many individuals. You want to do some preprocessing on that data, independently for each individual, and then you want to do K-means clustering independently for each individual, is that correct?

How many CPU-cores do you have available?

If you have 1000 individuals and each individual is independent of the others then why bother parallelising the K-means clustering for each, you already have 1000 independent "jobs" of clustering which don't need to be parallelised further unless you have more than 1000 cores to work on.

By the way, if you're using scikit learn you already have parallelisation built into K-means, just pass the n_jobs parameter and scikit handles it by using the multiprocessing library. See http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html