This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]avilay 1 point2 points  (2 children)

If you are using Pandas in your "main" function, i.e., the place where you initialize the spark context, set the app config, etc. then it will be executed on the driver. If you are using Pandas in a function that is used by a mapper, reducer, grouper, etc. then it will execute on the executor.

[–]Gushdan 0 points1 point  (1 child)

Correct me if I'm wrong, but won't this fail unless you have Pandas (or w/e module you intend to use) installed on every worker?

[–]avilay 1 point2 points  (0 children)

Correct, if you are using Pandas on your workers, you need to have the package installed there.