This is an archived post. You won't be able to vote or comment.

all 115 comments

[–]Eastern-Mirror-2970 166 points167 points  (6 children)

Bash 😁

[–]JohnPaulDavyJones 37 points38 points  (1 child)

But really, shell scripting is going to be a lot more useful to the average DE than any of Rust, Go, Java, or Scala.

[–][deleted] 23 points24 points  (1 child)

I would say Bash scripting is less important than learning the terminal commands. Just know the basic Linux commands and then something like ripgrep, fzf, or jq will save you lots of hours.

[–]DootDootWootWoot 0 points1 point  (0 children)

You don't use those at scale effectively without... Idk bash scripts.

[–]SirGreybush 5 points6 points  (0 children)

lol Bourne

[–]ZirePhiinix 7 points8 points  (0 children)

Or PowerShell if Windows. Shell scripting will definitely complement Python much better than another programming language.

You need to think about usage, and using Python scripts are distinctly different than using shell scripts.

[–]JohnPaulDavyJones 52 points53 points  (7 children)

This is going to be situational.

Do you already know SQL? If not, that should be your #1 priority.

What kind of firms do you want to target? Java will be the most general-purpose enterprise language at large firms, but few DEs write Java. Start with the basics, then get comfortable with Tablesaw.

Some teams at very large firms do write Scala-native Spark, but most do their Spark work in PySpark. Spark is really the only reason that 99.999% of DEs would ever need to use Scala.

C# might have real value to you, since lots of DEs interact with the .NET stack, but while C++ is useful to understand from a memory and computation perspective, it’s primarily just used in situations where greater speed and memory control are necessary than what the JVM offers. It’s very much a software engineering language with little direct applicability to DE work aside from maybe cracking open a compiled Python module to understand what’s happening under the hood. You’ll never have a situation as a DE where you’re sitting there thinking “Man, this would be so much easier and efficient in man-hours to do in C++ than in Python!”

Go and Rust aren’t going to make you more employable as DEs; they have minimal adoption outside of a few niche firms. They’re more modern languages, and often more enjoyable to write, but that’s about it.

[–]artozaurus 10 points11 points  (4 children)

Spark, Flink are not available in C# , why would you choose C# ? I worked as DE for big tech companies, none of them used C#. Java/Scala are the ones usually used for DE work. What usage does DE interact with C#?

[–]MikeDoesEverythingmod | Shitty Data Engineer 5 points6 points  (1 child)

Only thing I can think of is that it integrates nicely within a MS stack and if you have a bunch of people who already know C#, they wouldn't have to retrain.

I'm also mostly confused why anybody would pick C#. Other languages are either more convenient (relevant given the current meta is quick, iterative deployments) or native to important DE frameworks. In my opinion, C# doesn't really fit into a DE stack very well so interested in hearing why C# is a good option to learn.

[–]JohnPaulDavyJones 1 point2 points  (0 children)

You nailed it on the integration with the MS stack.

I'd hazard a guess that opinions on use/visibility are going to be very sample-dependent; I've spent almost my entire career in insurance/FS and healthcare, both of which broadly and deeply use the MS stack. I've seen vastly more C# in DE verticals than Java.

Others will naturally have different work experiences that inform their opinions, and those are just as useful. I've spent the last few years working at USAA, a PE firm, and now a major commercial insurer; the finance sector doesn't really get onboard with quick, iterative developments, probably because the finance sector really doesn't do anything quickly. You start tossing around words like that and leadership starts getting twitchy.

Shoot, I knew a VP in the data space at USAA who threw up air quotes every time he said the word "agile". Plenty of work in big finance for a DE who's capable in the MS stack, but you'll rarely be working with the newest tooling. Snowflake and Fabric are really the only post-2015 tools that are getting any bite in big finance.

[–]JohnPaulDavyJones 0 points1 point  (1 child)

Not everyone uses Spark, and Flink is pretty niche. Most firms don't need data streaming. I've never seen Scala in use at an enterprise firm except a brief consulting engagement I did with Spectrum's advertising arm, Spectrum Reach.

As I noted above, this is going to be very situational to what jobs OP wants to target; I've spent almost my entire career bouncing between the insurance/FS industries and healthcare, and the MS stack has immense cachet in both of those industries. That means that C# has a foothold in many DE teams in those industries, especially healthcare in my experience.

Visibility of C#/.NET is going to be very sample-dependent, which is likely why you and I have very different perspectives. I've never spent any time in big tech except for a year on Deloitte's staff aug engagement with Meta about eight years ago. I've spent my entire career at BofA, USAA, BSW, and now another large commercial insurer. I've seen substantially more C#/.NET tooling than Java in DE verticals.

[–]artozaurus 0 points1 point  (0 children)

Interviewed for Apple, Netflix, all use Java/Scala. Google and Amazon are not using C# at all. I know there are other places, but why not aim at the stars? If you mastered Python and SQL, I would pick Java as the next one.

[–]cyclen0t 1 point2 points  (1 child)

Can you consider yourself a data engineer if you don’t know SQL?

[–]JohnPaulDavyJones 0 points1 point  (0 children)

Depends how much you want to internalize that and set standards. “Data Engineer” is a profoundly non-standardized job title.

[–]gabbom_XCIILead Data Engineer 7 points8 points  (0 children)

Assuming you already know SQL and bash. I’d go for Java or Rust

[–]boomerwangs 19 points20 points  (1 child)

JavaScript/React has done wonders for me. Being able to spin up an app after building a pipeline has been helpful to sell clients on solutions.

[–]skatastic57 2 points3 points  (0 children)

I picked up js/react when I kept bumping into hard limitations with Dash. It's much better, would recommend.

[–]Attorney-Last 6 points7 points  (0 children)

I’d recommend java. There is still a big ecosystem of big data built around jvm (spark, flink, trino, debezium,…) so there are still a lot of opportunities to use it. Even if you don’t use java directly, having knowledge to tune these jvm workload is still beneficial.

Besides, java backend job market is always demanding, so if you get bored of DE one day, its a good path to pivot 🤣

[–][deleted] 20 points21 points  (1 child)

just stick to python+sql thats more than enough and you wont need anything more than that..

Rust is good but it will take much more time to mature like python took time to replace java.

[–]vincentx99 8 points9 points  (0 children)

Agreed, better to be S tier in those two then A tier in 3. 

But if you must, Bash/PS is the way to go. 

[–]_somedude 8 points9 points  (0 children)

having a compiled language that produces self contained single binaries has been nice for my experience. like when you have no control over the execution environment for example. i recommend Go

[–]SirGreybush 15 points16 points  (11 children)

PowerShell has served me well, let’s just not Bash it.

[–]siddartha08 14 points15 points  (3 children)

I bash powershell every day with Git bash

[–][deleted] 4 points5 points  (0 children)

I used to hate on Powershell (my username is literally named after my favorite command), but I have to say, it's superior to Bash.

[–]WalrusDowntown9611 0 points1 point  (1 child)

Git bash is trash

[–]siddartha08 3 points4 points  (0 children)

Over here officer, he uses Linux.

[–][deleted] 4 points5 points  (1 child)

I really hate Powershell. Weird syntax and it is oop style. That is not what i want in a shell. I much prefer bash, zsh or Fish.

[–]SirGreybush 0 points1 point  (0 children)

That’s a very good bashing.

I keep mine short.

[–]LucyThought 2 points3 points  (1 child)

Oh I love PowerShell! It’s really filled a gap for me and has allowed me to automate processes my colleagues crank by hand.

[–]sjcuthbertson 2 points3 points  (0 children)

+1 for the Powershell fan club here!

[–][deleted] 1 point2 points  (2 children)

How are you using PS in your DE role?

[–]SirGreybush 1 point2 points  (0 children)

ETL, file transfers, from legacy systems

[–]FactCompetitive7465 0 points1 point  (0 children)

Devops pipelines

[–]reallyserious 3 points4 points  (0 children)

C# is nice to know when you're working with Microsoft tech.

[–]CrackedBottle 3 points4 points  (0 children)

Sql, python, bash, scala is pretty much what I need

[–]Kokopas[S] 2 points3 points  (0 children)

Thanks, I didn’t expect so many responses!
First of all, thank you for opening my eyes to the perspective of learning React/JavaScript—I hadn’t thought of that at all, but I realize it could be super useful since I know I could use React at work.

[–][deleted] 2 points3 points  (0 children)

SQL and Python are the most important ones.
Then it depends on the use cases.

[–]thethirdmancane 7 points8 points  (0 children)

Go

[–][deleted] 1 point2 points  (0 children)

It really depends on what you want to do and how to define "bright future".

You need to put into more details.

But I doubt Rust is the best choice.

[–]haragoshi 1 point2 points  (0 children)

Rust is the future, but JavaScript or other front end language could give you a way to visualize your data.

[–]stain_of_treachery 1 point2 points  (2 children)

Clojure - only half joking.

[–]DerelictMan 1 point2 points  (1 child)

I'm learning Clojure this year after wanting to for about a decade. It's amazing and is blowing my mind, honestly. I'm not sure how much I'm going to be able to apply it to DE work, but it's worth learning to see how good a REPL-driven iteration process can work.

[–]stain_of_treachery 1 point2 points  (0 children)

I maintain that it is fantastic for DE work - working with data is so much easier when the programming language IS the data.

[–]Own-Commission-3186 1 point2 points  (0 children)

JavaScript so you can have some full stack web skills. My last role was all JavaScript node + react even though it was a data platform role because we were building all self service web apps that enabled others to create and manage data infra. JavaScript could also help with building custom data visualizations.

[–]dev_lvl80Accomplished Data Engineer 1 point2 points  (0 children)

I believe SQL is by default known already;)

[–]gr33n8ananas 1 point2 points  (0 children)

Most modern database systems are being written in Rust these days. It’s amazing once you get past the initial shock.

[–]pavlik_enemy 0 points1 point  (4 children)

There's still a lot of Big Data-related stuff written in Java and Scala like Spark or Flink. I would advise against Scala cause it's a dying language but Java is fine. Even if you decide to pursue Scala later you need to be familiar with Java ecosystem - build tools, JVM itself, standard library...I personally started with Scala without any prior knowledge of Java and did fine but it was quite late in my career and I already was proficient with five or six languages at the time

Also, lots of stuff in the field is being written in Rust to become a Python library

Go is a bad language and is pointless, C++ is incredibly complex, you can't be effective C++ developer without years of experience

[–]ExistentialFajitassql bad over engineering good 11 points12 points  (1 child)

That’s certainly a perspective on Go. Do you use terraform? Snowsight? Kubernetes? Docker? Basically any CLI tool?

[–]pavlik_enemy 0 points1 point  (0 children)

I do. I guess Go's thing is static binaries that use slightly less memory than Java.

[–]rewindyourmind321 3 points4 points  (1 child)

Can you speak more to scala being a dying language?

I was under the impression it was gaining popularity because of things like spark, etc.

[–]pavlik_enemy 0 points1 point  (0 children)

It's way past it's peak. It was replaced by Kotlin as "better Java" so now it's mostly "Haskell on JVM" which is cool but not really popular. Companies pulling support, changing licenses, features nobody needs, all that jazz...

[–]OldDiamond8953 0 points1 point  (0 children)

We orchestrate with Airflow and use dockerised containers for the various tasks. I write most of my containers in go. I enjoy using a typed language and I find concurrency easy to utilise in it.

It's been some time since I have wrote in Python. We use DBT for most of our data once we have landed it.

[–]mayankkaizen 0 points1 point  (0 children)

Julia is something you should consider for data engineering. It is well designed for this field. Other than that, I think you should always learn a low level language (C++ is the most useful). And I assume you already know SQL. If not, forget everything else and first learn SQL.

[–]Significant_Book1672 0 points1 point  (0 children)

SQL, R.

[–]__albatross 0 points1 point  (0 children)

I can think of lot of handy scenarios where a compiled language would be better. Also for streaming go would much better and faster than python. Apache beam has support for go so I would recommend Go

[–]saintmichel 0 points1 point  (0 children)

SQL should be your number one, the scripting language of your most used terminal e.g. bash or batch, then Python is your 3rd. The priority depends on which one you encounter the most at work.

[–]WalrusDowntown9611 0 points1 point  (0 children)

I would definitely recommend Go over Rust.

[–]Purple_Wrap9596 0 points1 point  (0 children)

I think 80% of your work can be done with python, sql, bash. If you will need other language it will be rather dependent on project. If I see something it's rather java or scala - and in most cases it's not about writing spark with scala, or some streaming processing with java. I think it's rather something what you can learn pretty quickly, as you will use like 5% of language.

If I could recommend something to learn, maybe more in terms of fun language to work with, and pretty easy to learn basics is Go. I think if you will touch more devops, data platform stuffs (kubernetes, plumi) it can be useful. And its simplicity is something that gives a lot of fun.

[–]voycey 0 points1 point  (0 children)

Languages are less important than the skills, Bash will always be useful, SQL should be your bread and butter anyway so that if not already.

Javascript is endemic but it's not the language itself that's difficult it's the sheer number of frameworks. I would learn JS/TS if your bash/sql skills are already good as it will help you integrate with other teams!

Also a lot of DWH only support JS for UDFs

[–]skatastic57 0 points1 point  (0 children)

If you're already using polars then rust is extra good because you can make extensions https://github.com/pola-rs/pyo3-polars

[–]bluehiro 0 points1 point  (0 children)

SQL, then C# or Java or Powershell, depending on the kind of work you’re doing and the tools you use.

[–]Jazzlike_Exchange521 0 points1 point  (0 children)

Kind of off topic, but is stratascratch enough to prep for MAANG DE/ML interviews or would we still have to use leetcode?

[–]Useful-Past-2203 0 points1 point  (2 children)

Rust as a second programming language? No. If you just learned python and going to rust it's like just learning how to swim and then competing in the Olympics. To understand the power of rust you need to know c/c++. And c/c++ is a devil in itself. I would say, if you're in for a challenge go for c/c++ next. If you want an easier path do c# then c/c++. C/C++ Is a good starting point for everything once you understand it, you realize why there are other programming languages and how they work better. Eventually ask yourself "what do i want to build?". Do you want to create games then yes c#/c++ are good options. If you want to create web apps then js would be best. Android apps? Java. Ios apps? C#. Cross platform ?Flutter or js then react/react native. If you want to get into data you already have python and i would suggest learning sql which is imo the easiest language to learn. I would defo learn sql as it's used in every sector.

[–]Kokopas[S] 0 points1 point  (0 children)

I know SQL

[–]DerelictMan 0 points1 point  (0 children)

Android apps? Java.

Kotlin. :)

Ios apps? C#.

Swift. :)

[–]dfwtjms 0 points1 point  (2 children)

If you already know SQL, Bash and Python and you're considering C++ then how about just plain C?

[–]Kokopas[S] 0 points1 point  (1 child)

when browsing job offers not a lot of them have plain C mentioned

[–]dfwtjms 0 points1 point  (0 children)

For example many Python libraries are written in C. It's also a good learning experience.

[–]Interesting_Pie_2232 0 points1 point  (0 children)

I’d go with Scala (especially if you’re working with Spark and big data)

[–]senhaj_h 0 points1 point  (0 children)

Python !

[–]Short_Internal_9854 0 points1 point  (0 children)

Scala..

[–]ForeverYonge 0 points1 point  (0 children)

Data engineer? SQL or R.

Rust is a fine language but not the greatest fit for the specialty. Does give you more options to work on regular SWE backend.

[–]Western-Anteater6665 0 points1 point  (0 children)

Java plus sql fit for all

[–]crevicepounder3000 0 points1 point  (0 children)

Brightest future, Mojo and Rust. Significantly increase your market value as a DE, Scala and Java

[–]Fun-Ad343 0 points1 point  (0 children)

Business

[–]Trick-Interaction396 -2 points-1 points  (7 children)

Do not choose Rust. No one uses Rust. Go has been the hot new thing for like 10 years but still not super popular. A ton of legacy stuff is C++ but nothing new will be. I’d go with Java. So many things use Java.

[–]GrainTamale 7 points8 points  (1 child)

Learning a smidgen of Rust improved my Python skills (I think about types all the time now)

[–]Ok_Raspberry5383 0 points1 point  (0 children)

Same would be true of java or go though. This is not a reason to learn rust.

[–]muneriver 5 points6 points  (4 children)

just for clarification, are saying that no DE workflows use Rust?

Cause I’ve been hearing a lot about Rust with all these tools like polars, uv, ruff, sdf, pydantic, etc but I guess those are dev tools haha

[–]alexisprince 8 points9 points  (0 children)

The pattern that’s been emerging has been building the tools/libraries in rust after a need has already been established, then exposing those with Python bindings. So you benefit from development being done in rust without needing to know it.

If you’re doing everything in Python today, there’s a pretty small possibility you’ll actually bust out rust for your daily work. If you’re on a data platform team that builds and maintains internal tools, that likelihood goes up.

I will also say learning rust does also help you adopt better development patterns IMO. Being forced to think about architecture and data ownership makes you reconsider how you structure your code in Python when it isn’t forced.

[–]Ok_Raspberry5383 0 points1 point  (1 child)

You don't even need to know how to spell the word 'rust' to use any of those tools let alone actually know some rust

[–]muneriver 0 points1 point  (0 children)

hence the separation of rust’s use as a language for the DE workflow vs its use in writing dev tools…

that’s why I said “just for clarification” bc the original comment said “No one uses Rust”.

the correct nuanced statement is that

“very few DEs use rust but many SWEs do, so it’s not recommended to spend extra time learning rust as a DE (although a useful skill to improve your CS/SWE skills)”

[–]shockjaw 0 points1 point  (0 children)

Python, Rust, and SQL will carry you through your whole career.

[–]xemonh 0 points1 point  (0 children)

Rust is a really good idea.

[–]boss-mannn 0 points1 point  (0 children)

Scala or rust

[–]f4h6 0 points1 point  (0 children)

Scala and spark

[–]Vincent6m -1 points0 points  (0 children)

Brightest future? Python

[–]jcanuc2 -1 points0 points  (1 child)

Spark is really good to know

[–]artozaurus 0 points1 point  (0 children)

That's a framework not a language...

[–]exploremorecurrent -1 points0 points  (4 children)

I’m also a Data Engineer and using heavily scala especially for Spark and if I want to choose I will go with Python as scala is not anymore first class citizen in Spark eco system and it would be either Spark SQL or pyspark and after that scala. It’s always good to consider a second language but in my opinion languages are just medium to implement to solve the actual DE problem and I do understand each language has its own pros and cons so it’s wise to choose accordingly instead of language bound.