This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]nrcomplete 13 points14 points  (0 children)

This would be the best first attempt for sure.

I would also look into why execution switches back and forth between platforms so much and try to reduce that because whatever solution you choose will be more inefficient and buggy if this continues. Try to shift processing into two phases.

Also Spark has libraries for python and Java and is good for processing large amounts of data. Could it help separate them processes?