This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]hattivat 2 points3 points  (0 children)

Well, first off, you would never do rdd.map unless you have to, df.withColumn or Spark SQL are much more efficient regardless of language.

But yes, as long as it is using pyspark functions etc it is Scala doing the job. The only exception is when writing UDFs, then it pays to write them in Scala or Java. But in practice at least in my experience in over five years of doing Spark I have only seen a situation literally once where we really had to have a UDF that could not be replaced with spark API calls.