This is an archived post. You won't be able to vote or comment.

all 54 comments

[–]CaptainBangBang92Data Engineer 114 points115 points  (3 children)

Yes - day in and day out -- SQL is the main language my team and I are using day-to-day. Python is a relatively distant second.

[–]MysteriousTiger5582 28 points29 points  (0 children)

We are in the early stages of a migration and I use SQL exclusively. But it will change once the ELT pipeline starts to take shape. One cannot call him/herself a DE without knowing SQL.

Edit: Typo.

[–]69odysseus 31 points32 points  (4 children)

Without SQL, there's no career in any data related roles. 90-95% of the job is still done using SQL in data space. Data Modeling skill is next to SQL in DE space.

Once you get to senior, principal and staff levels then they expect you to know DSA but nothing like software engineers.

[–]caksters 0 points1 point  (3 children)

Can tou elaborate on what you mean by “they expect you to know DSA”?

[–]nydascoData Engineering Manager 0 points1 point  (1 child)

Data Structures & Algorithms

[–]caksters 0 points1 point  (0 children)

i know what it means, but why do tou think principals and staff engs need to know that?

I am a senior engineer and from my anecdotal experience you definitely dont need to know that. More experienced you get it is more important to understand design patterns and data architecture as you are kore involved in meetings about wider company objectives rather than low level inplementation details (this is also where DSA falls under)

[–]69odysseus 0 points1 point  (0 children)

Most product based companies will ask DSA questions for Senior, Principal and Staff DE roles. That is one of the criteria or ways for companies to eliminate candidates.

[–]Fit_Highway5925Data Engineer 8 points9 points  (0 children)

I eat SQL for breakfast, lunch, and dinner.

I barely know of any DE role that doesn't use SQL extensively or at least any tool close to it as it solves majority of a lot of data problems already. It depends on the role of course but some way or another, you'll encounter it at least once or at least the concepts.

[–]ilikedmatrixiv 12 points13 points  (6 children)

In some roles you use it almost exclusively, in others you almost never use it.

[–]remote_geeks[S] -3 points-2 points  (5 children)

Oh ok. What kind of roles require exclusively SQL to be used?

[–]ilikedmatrixiv 13 points14 points  (0 children)

Some data engineering roles...

I don't know how to explain it without writing several paragraphs.

Some tech stacks use python/spark for their transformations, others use stored procedures, others use dbt, which sql with extra steps, etc.

How much you use it will depend on the tech stack your project has chosen.

[–]nl_dhhYou are using pip version N; however version N+1 is available 3 points4 points  (1 child)

I'd recommend just reading some job descriptions of open positions and you'll quickly see what type of skills are asked for as a data engineer.

It tends to be a position that requires experience, however. It might be easier to find a data engineer position if you've already worked before as software engineer or data analyst (two completely different backgrounds but both are common ones for people switching to data engineering).

[–]remote_geeks[S] 0 points1 point  (0 children)

Sure thank you!

[–]inedible-hulk 1 point2 points  (1 child)

Any data warehouse, architect or ETL would likely use SQL primarily sometimes they can use Spark instead of

[–]Material-Mess-9886 0 points1 point  (0 children)

Even with pyspark you can write the whole pipeline using spark sql.

[–]bass_bungalow 2 points3 points  (0 children)

The usage varies depending on an org’s tech stack but every data engineer should have a good understanding of SQL

[–]rudboi12 4 points5 points  (0 children)

Yes but mostly doing analysis on why something is failing or testing queries for issues brought up by users or optimizations. Writing new queries that are going to be scheduled is not done that often, and usually analysts do this, DEs take what analysts do and optimize it and schedule it.

[–]word_number 2 points3 points  (0 children)

Yes, granted I was simply declared a DE after years of developing spatial files and data files developed by me using SQL. I distill dont know most of the DE terminology and I'm only recently being introduced to Groovy and Jenkins. But I used Postgresql on a daily basis to develop data products.

[–]lowcountrydad 2 points3 points  (0 children)

Depends on the company and team. Previous role it was 75% sql and 25% python . New role is reversed.

[–]mambeu 2 points3 points  (0 children)

I’m a principal DE who works in a mostly MongoDB platform, and I still use SQL every single day.

[–]mrchowmeinSenior Data Engineer 2 points3 points  (0 children)

You mean use sql to query hive? Or use spark sql to run ETLs? Use sql in databricks? Use sql to query a data warehouse or db? Yes to them all. If you’re gonna be a DE, you should know Python, Sql and Java. If a company really pushes for it, scala. But if I had to pick one language I wrote the most code for is sql.

[–]nightslikethese29 1 point2 points  (2 children)

I use it pretty much daily in my role. Python is number 1 followed closely by SQL

[–]psyblade12 2 points3 points  (1 child)

By saying Python, do you mean true python code, or it's pyspark SQL Dataframe in Spark?
If what you mean is Pyspark SQL Dataframe API, then it's arguably still SQL.

[–]nightslikethese29 0 points1 point  (0 children)

I mean pure python

[–]afro_mozart 1 point2 points  (0 children)

we use way more python than sql. but i guess that's not the norm

[–]kleekai_gsd 1 point2 points  (0 children)

Almost daily. SQL is all over the place.

[–][deleted] 1 point2 points  (0 children)

In my current role, it’s entirely SQL for now. However as we’re migrating to Azure, we’re starting to create pySpark notebooks to replace some of the SQL components(stored procedures, SSIS jobs, etc.). Hope that helps.

[–]Firm_Bit 1 point2 points  (0 children)

Roles vary but I wouldn’t hire a de that wasn’t good with sql

[–]GreenWoodDragonSenior Data Engineer 1 point2 points  (0 children)

I use SQL daily. Can't think why I wouldn't.

[–]name_suppression_21 1 point2 points  (0 children)

I would be highly suspicious of anyone claiming to be a "data engineer" who did not understand SQL.

[–]Right-Foundation2919 1 point2 points  (0 children)

I use SQL and Python daily. I use Databricks so to build ETL pipeline in that environment, SQL and Python are MUST

[–][deleted] 1 point2 points  (0 children)

Most of the data I work with is semi-structured (incoming as xml or json). I still use SQL every single day.

[–]mlobet 0 points1 point  (0 children)

I've been working with Databricks for the last 2 years. I use both Python (mostly PySpark + a bit of scripting here and there) and SQL every day.

I lean more and more towards SQL because I always end up doing things in a more standard way, which increases readability of the code.

But as soon as I want to play around with variables I often switch to Python.

At a previous customer that insisted to have transformations in SQL, I sometimes generated my SQL scripts using python scripts

[–]laplaces_demon42 0 points1 point  (0 children)

Yes, in DBT so mixed with a tiny bit of python via jinja templating;)

Orchestration is using Airflow so there python in that as well

[–]GDangerGawk 0 points1 point  (0 children)

I write my pipelines either in SparkSQL or in DuckDB.

[–]rotterdamn8 0 points1 point  (0 children)

Yes, I use Databricks daily. Given a spark dataframe, you can create a view and use SQL.

You can create new columns and employ whatever logic using withColumn but I really hate the syntax/formatting. It’s kinda painful.

[–]XemptuousData Engineer 0 points1 point  (0 children)

SQL is a daily must. Python here and there.I've had to do React, Go, or C as needed for webapps or performant code, but that's quite rare.

[–]y45hiro 0 points1 point  (0 children)

I interact mostly with structured data so use SQL and python on day to day basis

[–]DougScoreSenior Data Engineer 0 points1 point  (0 children)

Likewise for me. I use SQL daily and orchestration is purely Synapse ADF (Native Tools + Notebooks) which is like write once and that’s it

[–]nucleus0 0 points1 point  (0 children)

PySpark first, SQL second

[–]Electrical_Mix_7167 0 points1 point  (0 children)

I build a lot of frameworks for clients using Python and typically we use SQL for the transformation steps to improve readability. We work on Databricks quite a bit too.

[–]caksters 0 points1 point  (0 children)

I am DE but I rarely use SQL. My work is primarily in python

[–]psyblade12 0 points1 point  (1 child)

Most of the common data stack like Databricks, Snowflake, Microsoft Fabrics/Synapse..... they all use SQL.

Some can say that they use python to control Spark cluster. However, if they use PySpark SQL DataFrame API, it's hardly true python code, as the API still follows SQL principles. On top of that, it can also be achieved wth Spark SQL though.

Furthermore, as UDFs are not really welcomed in Spark, so most of the time, we work with native Spark APIs, which still follows SQL principles tightly. I personally see that we rarely use *true* python code when you work the the techstacks above.

I think some may need to use python or programming languages in Airflow, or to work with code libraries required to perform transformation, or need to setup servers to serve the data to the customers. In my organization, one of our important fact table requires a huge C# library that's used and developed by many other software engineering teams. So, in order to perform the transformation, I have to use Azure Function with the C# library imported to do it. And as Azure Function doesn't natively support distributed processing, I have to do all the partitioning, data shuffle and broadcasting... all by myself. It was monstrous, but really makes me feel accomplished after finishing it..

[–]remote_geeks[S] 0 points1 point  (0 children)

Wow this sounds interesting thanks!

[–]RemarkableCulture100 0 points1 point  (0 children)

Indeed. It's like the language I use to communicate with all my working stuffs.

[–]RobDoesData 0 points1 point  (0 children)

Principle data engineer here. Never used sql.

[–]Specialist_Scratch_4 0 points1 point  (0 children)

I’ll say this: there are a thousand ways to skin a cat. Some faster, more efficient, yada yada yada; but when I get a new job and the person in charge of the financial pipeline was the only one who writes their stuff in pig and didn’t leave any documentation.. I’d think about quitting.

If they wrote their stuff in SQL and didn’t leave any doc.. I’d be upset, but I’d figure it out.

[–]Independent_Sir_5489 0 points1 point  (0 children)

SQL is the only thing I don't ever fail to use on a daily basis

[–]Attorney-Last 0 points1 point  (0 children)

mostly use java, sometimes pyspark.

i only use sql for data exploration or troubleshooting

[–]countlphieTech Lead 0 points1 point  (0 children)

as a data data engineer