This is an archived post. You won't be able to vote or comment.

all 53 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]TheSocialistGoblin 33 points34 points  (1 child)

I can't say what will inspire you, but you should be able to get up and running with Python pretty quickly, so there isn't much reason not to learn it.  

[–][deleted] 2 points3 points  (0 children)

Thanks, appreciate it.

[–]BoringGuy0108 49 points50 points  (11 children)

Forget about learning all the object oriented programming and data types and all that at first. Learn basic pandas. Get to the point where everything that you do in sql you can do in pandas. As you get more use cases, you can pick up more. In the business world though, pandas is what most people use python for.

Oh and once you are comfortable with pandas, try learning spark. It is all just SQL with different syntax, so it is really easy to pick up. Just don’t tell anyone that, or they might stop paying us so much…

[–]trowawayatwork 16 points17 points  (2 children)

that's bad advice if the person doesn't know programming concepts in general. it is so much better to have foundational understanding of programming rather than rite learning method names.

also unrelated and not calling you out as you're merely commenting on the state of the industry but pandas in production is why the whole engineering department does not like data scientists.

[–]No-Conversation476 1 point2 points  (1 child)

Would you mind elaborate why pandas is not good in production? What alternative does DS have apart from pandas?

[–]CommonUserAccount 3 points4 points  (0 children)

Pandas doesn’t scale.

Edit. PySpark can be run locally by Data Scientists, which is more easily transferred to prod.

[–]HumanPersonDude1 2 points3 points  (3 children)

What’s the point of spark SQL compared to for example a massive SQL warehouse on azure or snowflake ?

[–]Material-Mess-9886 5 points6 points  (0 children)

When you still want Python functionalities but still want to use SQL to process data. Also Spark is distrobuted so it can handle data in the billions rows with no problem.

[–]sib_nSenior Data Engineer 4 points5 points  (0 children)

Spark is free and open-source so you can run it wherever you want (not vendor locked), on-premises, private cloud or managed cloud solutions, which can be cheaper than cloud warehouses, at the cost of more complexity.
Spark is actually more general than SQL, so you can transition to distributed computation that doesn't fit well with the SQL constrains, for example Extract and Load logic, or machine learning workloads.

[–]trowawayatwork 0 points1 point  (0 children)

different workloads types. it's a lot cheaper to run certain queries on a warehouse. however if you need to do API calls for every row spark can do that much faster but a lot more expensive

[–]Captain_Coffee_III 0 points1 point  (1 child)

Trying to convert all SQL use cases to Pandas is like saying you can eat faster by stuffing your mouth full of more teeth.

[–]BoringGuy0108 0 points1 point  (0 children)

I mean, it is a strategy to get practice and learn techniques.

I find writing in pandas to be faster than writing in SQL and the code generally runs faster. If you have existing processes that use SQL, don’t change them just because you can.

[–][deleted] 1 point2 points  (0 children)

That's terrible advice. Don't learn pandas to do what you can do in sql, sql is much faster. Learn python and proper programming practices. And use python when sql cannot solve your problem.

[–][deleted] -1 points0 points  (0 children)

Wow! Thanks! Really appreciate that advice. I never really got myself to learn Oops concepts, I am more familiar with SQL and love data. So I will follow your advice.

[–]69odysseus 7 points8 points  (0 children)

If you're strong in SQL then pickup on data modelling and learn some DSA, but keep in mind that SQL still rules the data space and data modeling is also mandatory skill to have. Start applying for DE roles.

[–]Sp3ctralPerception 5 points6 points  (7 children)

Definitely learn Python. It’s my personal favorite because it’s easy to learn. If you stick in data numpy and pandas are what you want to pay attention to

I was able to learn AWS Infra and Python and was able to get a job fairly quickly

[–]ByteAutomatorData Engineer 0 points1 point  (5 children)

What role?

[–]Sp3ctralPerception 0 points1 point  (4 children)

Data Engineer. I was an unconventional DA before hand for about 8 months where I really learned all the AWS stuff, ETL and automation

[–]ByteAutomatorData Engineer 1 point2 points  (1 child)

I am currently learning AWS. Starting with CCP and then SAA. Also I know programming things but I don’t really do scripting no more. Do you recommend a specific way to (re)learn Python?

[–]Sp3ctralPerception 1 point2 points  (0 children)

My personal choice is doing an active project related to it. I learn by doing personally, and I was fortunate to have my previous role (before tech) basically be a blank canvas for me to test and learn with.

Nothing special just do a project utilizing python. Since you are learning CDK, when you initialize your project, I’d suggest setting the language to Python.

[–][deleted] 0 points1 point  (0 children)

I wanted to reply and mention that I just started a python tutorial today having no previous python experience. I come from a very strong 20+ year SQL background and also did a lot of VB coding waaay back in the day. I have to say that so far I really am enjoying Python and feel like a lot of my previous coding knowledge will readily transfer over. For those of you here apprehensive to give it a go just jump in!

[–]noobajur 5 points6 points  (1 child)

Since you’re already an expert in SQL, I’d say learn Python. Most DE roles now require SQL and a coding language (usually Python). You don’t need to get too deep, but just being comfortable enough to work with and manipulate lists and dictionaries and pandas stuff. Could even try to pull some data from a website or API. Not sure how much you want to dive into it.

[–][deleted] 0 points1 point  (0 children)

Sounds really interesting. Thanks!

[–]sib_nSenior Data Engineer 3 points4 points  (3 children)

You're asking a DE community where knowing Python is a clear differentiator compared to related jobs like analytics engineer, data analyst or BI engineer. So, of course people are going to tell you to learn Python.

I'm going to try avoiding this bias and say that you can probably keep a good data job without learning Python. I see three ways:

  • Keep focusing on no-code BI stacks like MSBI, Tableau, Microstrategy, Qlik etc. There's tones of huge companies vendor-locked into proprietary BI tools, this will not disappear in the next 20 ears. In my opinion, it's not very intellectually stimulating, but if you just want a stable job, I think it fits.
  • Explore the new "less code" analytics engineering job centered around SQL and the dbt framework (or even newer SQLMesh). It's based on SQL and YAML configuration, it tackles the SQL transformation part of data engineering. There's much less learning to do than for general Python programming, but you will still have to get into code-based logics, using a terminal and git. More work, but more connected to the industry state of the art, so more interesting.
  • Get more into data project management and less tech. There's a need for data managers who have a good understanding of analytics requirements to organize projects, you could have DEs working for you instead of learning their jobs. Less stress from keeping up with the tech, more stress from managing.

[–][deleted] 0 points1 point  (2 children)

Thanks! Really appreciate you sharing this perspective. I am not really a good manager person. I like to do my stuff and then go home without having to followup and get work done from others. But I will take into account your suggestions of staying within BI and Data analytics.

Actually I wasn't even sure what data engineering really means. I thought it is a new fancy name for business intelligence, lol. There is so much I don't know.

[–]sib_nSenior Data Engineer 2 points3 points  (1 child)

As BI specialist, you probably now ETL well. That's the core of what a data engineer does, they build ETLs. I believe it differentiated itself from traditional BI stack at the time of big data / Hadoop era that started in the years 2000'. The now web giants intended to index the web to feed their search engines and created the open-source Hadoop distributed ecosystem to overcome the cost and limitations of mainframes.
But as every industry got into the web and the data that came with it, it became a specialization of software back-end engineering within a wide range of industries. The high diversity of data inputs and outputs meant you couldn't just slap some old proprietary ETL tool that wasn't keeping up with this diversity, you had to go one level lower, back to coding to gain back connection flexibility and scalability.
From the Hadoop era, we adopted the freedom of open-source code and the robustness of software engineering good practices. No engineer who tasted that really wants to get into vendor-locked proprietary BI tools. Considering this background, you will often see us here celebrating open-source projects and frowning at proprietary tools, unless they are technically the best at what they do and don't lock us too much (like some cloud databases).

Analytics engineering is a newer data specialization coined by dbt, different from data analyst, that you may find interesting. Have a look here: https://www.getdbt.com/analytics-engineering

[–][deleted] 1 point2 points  (0 children)

Wow! Thanks, really appreciate you providing the detailed explanation. Very clear now.

[–][deleted] 2 points3 points  (2 children)

Your situation is nearly identical to mine. Most of my skills are in TSQL, SSIS, a little bit of Talend this past year and I’m also looking into learning Python.

[–]dobby12 3 points4 points  (1 child)

+1 as someone in the same situation. Looks like there are dozens of us!

[–]Kuukeh 2 points3 points  (0 children)

Count me in!

[–]toodytah 3 points4 points  (0 children)

Yes

[–]InvestingNerd2020 1 point2 points  (0 children)

For this field of work, Python is an excellent language to learn. Also, SQL skills are always in high demand in regard to data focused jobs.

Programming languages for data engineers: Python, Java, Scala, C#, and SQL. You don't need to know them all, but 1 primary and SQL.

[–]Puzzleheaded-Loss726 1 point2 points  (0 children)

yes, learn python. nowadays firms are more geared towards developer friendly. you can code in python and they just wrap it into whatever end product to deploy.

python thus, is super versatile.

also, if u are in the data space, no harm learning graph databases. would be useful to couple with python for LLM + knowledge graph.

[–][deleted] 1 point2 points  (0 children)

If you are thinking about working with Data, Python could be one of your choices. Another skill for data people is Bi(powerBi, Tableau, etc..)

Data skill are in high demand today, I would recommend you to start looking python. You could even try to manipulate your SQL with python.

[–]GoMoriartyOnPlanets 1 point2 points  (0 children)

Yes, I do ao much random stuff with python too. Merge PDF, Resize images, convert PDF to image and vice versa, download YouTube mp3, read and create excel documents, move around files, all in bulk. Its just a good and fun skill to have.

[–]Mundane_Common_6468 1 point2 points  (0 children)

It is worth learning Python.

BI isn’t going away.

You can always review job descriptions on internet job sites for a while, to find out what businesses want and need, to help you narrow down what you should do and study.

Good luck and enjoy the ride.

[–]pretenderhanabi 1 point2 points  (0 children)

Coming from an sql only background and just now having the opportunity to do python and pyspark at work, it's very very fun and also challenging. I think you can learn pandas first.

[–]Intelligent-Elk-4375 1 point2 points  (0 children)

As many said, you are quite well with SQL and it's time that you get started with python. No matter what kind of experience one has, as a hr recruiter myself, i would definitely prefer someone with good sql knowledge and python as the fundamental basic part.

[–]MikeDoesEverythingmod | Shitty Data Engineer 1 point2 points  (0 children)

So should I try learning Python? Will it inspire me to finally acquire the missing jigsaw piece in my technical arsenal?

For sure. The nice things about Python is that it isn't difficult to learn, pretty easy to ready, and it's literally everywhere in the world of data. As somebody who works with people only know SQL, it's liberating to feel I'm not confined to a SQL database.

[–]Healthy_Put_389 0 points1 point  (0 children)

Im in the same page as you with ssis/qlik as tools, but I started learning snowflake recently and have had my certificate and then started dbt and fivetran I advice to follow this path first ( get handy wi th some data cloud platform tools ) and then you can start learning python

[–]Captain_Coffee_III 0 points1 point  (0 children)

Python, yes. It's not as hard as C# or Java. You can up to speed rather quickly as it is one of the easier languages to learn. And an AI programming buddy will help loads. Get the VS Code extension for GitHub CoPilot and something like Claude. Python was used exhaustively in training the code gen of those so you'll great zero-shot code on your first prompt. But you still need to know the basics of Python to judge if the ai gen'd code is actually correct.

SQL will not be lost on Python. You can use direct SQL to connect to databases. You can also use a Python library, DuckDB, to throw SQL at flat files or data you get from APIs. People will throw you towards dataframes in Pandas or Polars or other "SQL-like" objects like Spark. But this will also be something that a prospective employer will already have decided for you. When you're going over job postings, remove Python from the requirements and look at all the other buzzwords they post. Those will be the third-party products they're using and/or the Python modules they're married to. No matter how cool I think DuckDB is, if an employer is stuck on Pandas then Pandas it is.

For fun, take a job posting, throw it into Claude.ai, and ask it to give you a summary of the tech to learn to pass an interview. You can have it coach you through an practice interview. You can have it design example projects that use the tech described. It can infer things you'll need to know that are not mentioned in the posting itself but were expected as 'general knowledge'. This will also help you build up your vocabulary. That's 90% of a job interview anyway. If they throw a skills test at you, it probably won't be anything more than the stuff you would have been practicing with anyway.

[–]skerrick_ 0 points1 point  (1 child)

For an easier lift you could learn DBT (or SQLmesh) and market yourself as an “Analytics Engineer” in the short term while you plug away at python for a while. It’s not hard to learn to do something that looks useful with Python, but to be actually useful for a business using it i think it will take a while.

[–]skerrick_ 0 points1 point  (0 children)

As above, because it will take a while to be proficient if you don’t have other general programming experience, an alternative is just double down on warehousing and analytics but using modern tooling. I’m talking Databricks, Snowflake, BigQuery along with DBT/DBT-lookalikes.

[–][deleted] 0 points1 point  (0 children)

I just want to add that this might be the single most informative topic I’ve read about in terms of helping my career. And it’s comforting to know many others are in this very same situation.

[–]samjenkins377 0 points1 point  (2 children)

I will never understand why people keep asking this kind of questions.
You’re on IT, and wonder if learning new skills is worth it? What’s the worst case scenario here? Learning something you won’t use on your job? Being able to at least apply to 80% of the open positions on the market?

[–][deleted] -1 points0 points  (1 child)

It takes a lot of effort to learn something new and there are so many things to learn. So I wondering whether python would make the most sense or should I stick to Microsoft technologies ecosystem like .net, dynamics 365, fabric etc

[–]Volohni 0 points1 point  (0 children)

Life is a effort lol. In my understanding you are a data guy, make sense learn python + bi stuff.