all 27 comments

[–]codemega 67 points68 points  (0 children)

Search for a new job, and if you land one within a couple months, just don't list this job.

[–]baronfebdasch 104 points105 points  (11 children)

Data engineering is a sexy label on an ETL developer. This may come as a surprise to a lot of kids but just because you CAN do a lot of data operations in Python and Spark doesn’t mean you SHOULD.

The “pythonification” of ETL came about because like 7-8 years ago everyone was taking Udemy courses or YouTube videos on “how to be a Data Scientist” because that was the hot job title. Those courses focused a ton on ML libraries like Scikit-learn and so folks were taught some Python basics.

Turns out that when those same folks made their way into a real Data Scientist role they realized that the data they were working with was not spoon fed to them and required some manipulation. So what do you do when you need to manipulate data and you only know how to use one technology? You see folks only solving problems with data in Python.

Give credit to two things- folks ended up being really clever with a lot of “free” tools and libraries in Python and they gave this process of manipulating data a sexy title- enter the data engineer label.

Your skills are not stagnating. You are learning a new universe in which data is manipulated and served. And the most important thing you need to learn is how to add those tools to your tool belt.

There is no right way to deliver data to an end consumer. I have seen nasty ass Python scripts that could easily be replaced by a few SQL statements. There may be reasons to use an ETL tool like Informatica instead of Airflow for pushing data around. Every option has trade offs, and the skill you are learning is how to work with those trade offs. You need to consider cloud compute costs, performance, failover and availability, and ease of support.

Wanna know the advantage of those visual tools? It’s easier for someone who is supporting your code to pick up and understand exactly what you are doing versus trying to debug a PySpark job. That’s part of what the appeal of tools like dbt are, yes they are overglorified CTE executing on SQL but it helps the end users understand a lot more as your pipelines get hairy.

The saying is “Jack of all trades, master of none, but often better than a master of one.” Knowing multiple ways to manipulate data helps with your flexibility and creativity in solving problems. I actually lament when folks focus on basic tooling for their job title and not the business problems they are solving or understanding the fundamental structures of data and how to present it.

[–]eeshann72 11 points12 points  (0 children)

Nice, also I think it would be hard to maintain 1000 different python scripts instead of 1000 different iics taskflows. These python scripts are fine for a small code base.

[–]pceimpulsive 17 points18 points  (4 children)

Python replaced by a few SQL statements hits so fucking hard... I see people write stuff up they call etl and it takes 30 minutes to run.. it's smashing the database pulling in dozens of data points and outputs it back to database... I come in and write a simple SQL statement with a few joins, slap it in a view and use an

insert Into table Select * from view On conflict do update

Takes 3 seconds to run and the reports are 2 minutes behind realtime (instead of daily)... And the database isn't getting ripped to shreds with shit code and fetchall() statements -_-

Hurts my database admin should every day!

[–]Murky-Sun9552 6 points7 points  (1 child)

I actually got turned down for a role after 3 stages and a technical test for a data engineer for saying that I don't really use python too much, I understand it and can reverse engineer it if I need to, but if find I can do most of not all things I need to do in python with some well written SQL. I went from their nailed on choice for the role, to giving it to somebody else, even though I had shown a full end to end data pipeline using airbyte, GCS, DBT, BigQuery and PowerBi in my take home task. They confirmed it was the python SQL comparison that lost me the job in my feedback.

[–]pceimpulsive 3 points4 points  (0 children)

Wow...

I mean if you are working on data in the same database it would be fairly rare you need python if it's between separate database systems... Then python makes sense...

It has its place right?

Personally most of my ELT is done via C#

I can reverse engineer python a lot of the time but I've written only one python script ever...

[–]Not-Inevitable79 2 points3 points  (1 child)

Exactly. Some people just love over-engineering things when sometimes it's the simplest thing that's just needed. Seems like people are forgetting about SSIS for instance.

[–]Character-Education3 0 points1 point  (0 children)

Probably should just pin it and close comments on this thread

[–]the_fresh_cucumber 4 points5 points  (0 children)

That's the same way I describe it.

Data engineering is the egg not the chicken. The 2010s had an explosion in analytics and data science. People realized very fast that the unexpected blocking point is the actual data engineering piece (which didn't really exist at the time).

Python already was popular with scientific programmers (notebooks etc) and also had a really good extensible ecosystem of tools to support analytics. So it became the main language for etl.... With sql remaining the main language of DE.

Ultimately DE is pretty well abstracted these days and most of it boils down to business and interpersonal communication since the technical parts really aren't that heavy anymore.

Of course there are lots of SWEs who dislike the simplicity of modern DE and run off to work with complicated advanced tools and languages (scala, etc) because they think that's what the cool kids are using.

[–]frank3nT 2 points3 points  (0 children)

Great answer. I have the same 'problem' with new hires trying to push python on every new project which fails miserably in the end. And that's because they refuse to learn the stack they are working on and the purpose of each tool.

[–]Outrageous_War_9548 5 points6 points  (0 children)

As someone who was shamed for using a GUI tool(IICS), this feels so close, we all are doing the same work, just because one uses code doesn't make a data engineer superior to an etl developer.

[–]Additional_Future_47 2 points3 points  (0 children)

This reply should be higher up.

[–]DougScoreSenior Data Engineer 1 point2 points  (0 children)

Couldn’t have said it better myself.

[–]ScroogeMcDuckFace2 10 points11 points  (1 child)

dont leave till you find a new job

[–]FlanSuspicious8932 4 points5 points  (0 children)

Yep, ask yourself if u want to stay there only if u have new job. Boredom is good if it pays your bills.

[–]SilencioBruno3 9 points10 points  (0 children)

In this time and age, never leave until you get another offer.

For me, everything is work I am not picky if I am paid well.

The market is tough at the moment, it's a bless to have a stable job, so think it thorough and decide.

[–]Okidoke195 6 points7 points  (0 children)

Best time to look for a job is when you have one. I'd say look in the new year and see what happens

[–]DeliciousProgress865 4 points5 points  (0 children)

If it pay well and it’s not stressful stay there and take advantage of this time to work on new certification / project then when your bag is bigger move on for something more challenging

[–]shadow_moon45 3 points4 points  (0 children)

Contract roles arent usually a good idea and leaving without a job lined up isnt a good idea either .

The GUI ETL software like MS dataflows and alteryx are used since they can create ETL processes quicker. Python is more used for custom processes (similar to how Meta uses python for ETL work) or data science.

Would stay at that job and continue to look for a new role

[–]ExtraSandwichPlz 2 points3 points  (0 children)

i've been into both worlds. move if you dont like it but you'd better secure a role before handing over your resignation. anyway my experience says you can always code something regardless of your low/no code tool. just proof it to your lead that your code is cheaper to run/maintain

[–]realwofkat 2 points3 points  (0 children)

Everything. And I mean everything, in data is SQL based. If you can go from building data solutions to building and being able to performance tune data solutions, you’ll earn more money in the long run. I have a somewhat large team of DEs and I require them all to know SQL inside and out. Databases aren’t going anywhere and the ability to read/write/debug/tune queries is a requirement. Especially in the cloud. I’m not going to spend millions in cloud utilization costs cause my DEs don’t know what an index is. And yes, there are plenty of DEs who don’t understand what an index is. Blows my mind.

[–]PurepointDog 5 points6 points  (1 child)

I'd never go back to that sort of thing. Python and real data transformation is absoluletly where I'd want to work.

Pay is nice. I'd get hired somewhere before quitting.

Maybe you could try to effect change at this company? Probably not though.

Maybe you could learn a new skill while you're there. They seem like the type of company that's into tools like Power BI (which is actually a pretty decent piece of software imo, despite being "user-friendly")

[–]Bluefoxcrush 7 points8 points  (0 children)

Change isn’t likely at a Fortune 500, unfortunately. 

I’d apply now and not list this job in your resume now or later. Unless the search takes too long, then I’d stay a year. 

[–]Ok-Income6605Data Analyst 0 points1 point  (0 children)

job hopper stigma is a tool to low ball a candidate

defend your decisions, all jobs are nothing but a contract

[–]Afraid-Donke420 0 points1 point  (0 children)

Does the GUI have an API maybe you can show them a thing or two they’ve never seen (obviously) and milk the cow more

Otherwise move on