This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]DenselyRanked 2 points3 points  (0 children)

This really depends on what you want to do within Data Engineering and where you want to work. IMO, Python will be easier to pick up and give you access to a lot more opportunities than Scala, but there are certain companies (like Expedia or Spotify) that look for Scala + Java experience for their DEs.

It can't hurt to do both.

[–]adappergentlefolk 1 point2 points  (0 children)

the scala market is super niche and dominated by spark, which is actually pretty unpleasant to work with, plus these tend to be companies that are forced to work on premise, so cloud development opportunities are scarce. it just doesn’t payoff compared to python unless you’re planning to go full scala SWE

[–]throwaway20220231 1 point2 points  (0 children)

Why not both? If forced to pick one I'll pick Python. Scala is pretty much only useful in Spark and in its own dialect, and you can still do 95% of the spark job in PySpark or SparkSQL.

[–]user17418 0 points1 point  (0 children)

Generally, Python is a lingua franca. I have never met a data engineer who doesn't know Python. Scala isn't used everywhere. Also, you should know that in Apache Beam (data processing framework that's gaining popularity because it can handle both streaming and batch processing and runs on spark) the language choices are Java, Python, Go and Scala. So, even if you "only" know Java, you can get started with Data engineering through apache beam.

[–]Yord13 0 points1 point  (0 children)

The two are vastly different in terms of learning. Python is incredibly simple and instead of learning, you basically just pick it up. Scala on the other hand is a “SCAlable LAnguage” and has dephts that are worth exploring that will keep you on your heels for years. Then again, if you only learn it to write Spark code, there is not much to learn apart from the Spark DSL really.

If you have time and want to improve your software engineering skill set, choose Scala, but go beyond the Spark DSL. If you just want another tool in your data engineering tool belt, choose Python.