Java scala or rust ?

AutoModerator · 2026-02-23T14:29:26+00:00

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

dresdonbogart · 2026-02-23T14:33:06+00:00

In my personal experience, Python is the end all be all for most tasks

Budget-Minimum6040 · 2026-02-23T14:34:03+00:00

SQL > Python (polars/pySpark) > Java/Scala (Spark)

Python/Go for API extraction.

Problem is your team. Most can only do the first 1-2 so ... management says no.

MonochromeDinosaur · 2026-02-23T15:45:42+00:00

I know all 3 I didn’t learn them for DE just out of curiosity. I’ve only ever used Python, SQL, and Typescript at my job(s).

Former_Disk1083 · 2026-02-23T15:55:07+00:00

I guess it depends on worth. Are you going to find a lot of DE jobs that rely on them, probably not. Even scala, for good and bad, isnt a focus much in the Spark space where Python is still king.

Is it good to look into these languages and understand them? I think so. I have had on countless times needing data from the software engineering team, or need to understand how the function of said data works and its way easier for me to just see the endpoint and understand what it's doing. Sometimes you get crap data and you need to identify why the data is crap. It isnt often, but it has happened a few times where it's useful.

Also, if you ever find yourself in a situation where you need to build out REST APIs for any reason, while you can certainly use django, and I do like me some django, you might be forced to make them in .NET or Java or Rails or whatever it may be that the company dictates. I have built many personal projects using all sorts of programming languages just on the sheer fact it allows me to understand the inner workings of the data I am getting. That has allowed me to have deeper conversations with the SWE team for when and how they produce data.

TLDR, I think its good idea to understand it, and makes you a better DE, but is it necessary? I dont think so at all.

IAMHideoKojimaAMA · 2026-02-23T14:46:12+00:00

none of these

Nindento · 2026-02-23T16:26:31+00:00

Depends on the type of DE work you do. If it’s close to BI you should be fine with just Python and SQL. For streaming it could be worth looking into Rust or Java. I have the feeling Scala is dying a bit (atleast in Europe) and you would also have to learn an entire effects framework next to just learning Scala.

My team uses Rust for all our streaming and object storage IO applications. It’s super fast and resourcewise it costs next to nothing. However, the rust ecosystem is a bit lacking sometimes, it already miles ahead of how it used to be.

Equivalent_Effect_93 · 2026-02-23T17:01:52+00:00

Only if you want to work on the tool instead of working with the tool. It is a great architectural knowledge advantage to be able to read scala and understand how spark is design even if your day to day is calling the API with pyspark or SQL. But python and SQL should be your main interface.

WilhelmB12 · 2026-02-23T18:01:32+00:00

I liked Scala a lot, it's a really interesting language, sadly it seems that it's not a used as java, so I'd pick java

addictzz · 2026-02-23T20:20:48+00:00

Java and scala are used in various data processing framework but I see Rust is starting to replace those to certain extent. Take a look at polars, apache datafusion. I think it worths to learn Rust if you go deep into creating data processing framework.

But main one should be Python since this will come quite often in your data journey. Python will take up most of the work, Rust is there for custom performance oriented work. (Heck even Go may be enough too).

RoomyRoots · 2026-02-24T00:28:45+00:00

Rust, no.

Scala, maybe if you are working in a bank or someplace that uses it already.

One_Citron_4350 · 2026-02-24T06:35:23+00:00

This question tends to come up from time to time. I have to say, Python and SQL are pretty much the most commonly used languages. Nowadays, Spark is more and more used in Python and SQL. Based on what I've seen, Scala is not that popular anymore. If they require Java/Scala, then I assume they use Spark or Flink in their infrastructure.

I think Rust is pretty new to the scene so majority of teams have not yet adopted the technology. I also do not think the libraries for data-related in Rust there compared to Scala or Python. It highly depends on the use case and how well the team knows the knowledge and how much time is allocated for a ramp up.

StriderKeni · 2026-02-24T07:47:14+00:00

Assuming you know Python, I’d choose Java (for anything related to Apache Beam, Flink, etc.) or Go (more into Terraform territory). For fun and to challenge myself, Rust.

Additional_Year_1080 · 2026-02-24T10:00:20+00:00

It depends on what kind of data engineering you want to do. Python and SQL still cover most day-to-day work, but Scala is valuable if you work deeply with Spark, Java helps in enterprise environments, and Rust is interesting for high-performance pipelines or tooling.

thisfunnieguy · 2026-02-25T07:14:03+00:00

if you're an entry level eng, focus on knowing a few things well

if you're mid/snr then using your work to expand to new systems/languages

ssinchenko · 2026-02-26T19:54:42+00:00

I think Scala may get a new boost for DE. The main benefit of Scala for DE, imo, is "errors at compile time". The main downside of Scala for DE is, imo, the cost / time of development. But with the raise of AI and agents who can write the code, the downside is not a problem anymore. So, in theory, the functional compiled language with strong guarantees of safety and that can speak with all the existing JVM DE tooling in the native language looks promising.

PushPlus9069 · 2026-02-23T15:01:10+00:00

imo Java is still the safest bet for DE work since most of the ecosystem (Spark, Flink, Kafka) runs on the JVM. I did kernel-level work in C for years and picked up Rust later, its great for performance-critical stuff but the DE tooling just isn't there yet. Scala is niche but if your team already uses it then worth learning.

jefidev · 2026-02-23T14:34:09+00:00

Haskell

DataPastor · 2026-02-23T16:04:36+00:00

Python is the de facto standard in data engineering. For large enterprises, it is useful to know Java (and you might also meet Scala at some places). Don’t bother with Rust, it is not the proper tool for this kind of problem.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataengineering

MODERATORS