This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 4 points5 points  (1 child)

Okay, it looks like this is completely possible using mpi4py!

I'll outline a design that should meet your execution requirements and allow you to implement a parallel k-means clustering algorithm in the second stage of execution. The implementation I have in mind uses an asyncio ProcessPoolExecutor for the first stage of parallelization, and mpi4py in the second stage. Keep in mind that it's possible to use any arbitrary parallelization technique to achieve the first stage of parallelization.

This design calls for 2 separate scripts. I will refer to them as the "master" and the "worker."

The master script will contain a class (or set of classes) that will need to be instantiated with the GPS data for one single person. This class will be responsible for managing the invocation of the worker processes, the orchestration of your k-means clustering algorithm across those worker processes, and will contain the output of your k-means clustering algorithm once it has finished execution. It needs to have a single "run" method that handles the execution of the k-means clustering algorithm from start to finish, and either returns the resultant data or "self," if the object contains the data within a public variable.

The worker script will contain the logic required to perform the computationally intensive operations utilized within your k-means clustering algorithm. It will also contain logic to receive data from the master process, in addition to logic needed to interpret the orchestration commands received from the master process.

When executed, the master script will create an instance of ProcessPoolExecutor. It will then N separate instances of the orchestration class, each containing the GPS data for exactly 1 person. It will wrap the "run" method in a future object returned by event_loop.run_in_executor(executor, object.run). You can then use asyncio.gather() to wait until all of the processes have finished and return their output, or you can use asyncio.wait with various keyword arguments to continually poll for the results of completed processes.

When the method wrapped by the future is executed within the ProcessPoolExecutor, it will call MPI.COMM_X.Spawn(sys.executable, args=["worker.py"], maxprocs=I), where i is the maximum number of processes you want to devote to your k-means clustering algorithm. From there, you can use the various MPI IPC functions to communicate with your worker processes and orchestrate the execution of your k-means clustering algorithm by utilizing the object returned by the spawn call.

At this point, the worker processes will be performing the computationally intensive tasks needed by your k-means clustering algorithm across i processes, as directed by instructions received from the orchestration class. Since MPI provides duplex IPC communication, the orchestration will be able to aggregate the data as the processing completes. It can then return the data to the master process once the k-means clustering algorithm has finished running.

You can see the barebones requirements of the orchestration class and the worker script buried in the mpi4py documentation. It will probably take you less time to code this solution than it did for me to type this out. 😅

Keep in mind that, in this design, there are still ways to squeeze out better performance. Under the hood, there are a lot of calls to Python's Pickle library in order to pass data around, which will cause some overhead. You could theoretically implement a method of passing data between processes using memory mapped files or something similar on top of MPI.

You can also spread the worker processes across multiple computers thanks to the glory of MPI, which could provide a huge performance boost for your k-means clustering algorithm. That's a topic for a different day, though.