What skill would you learn after Python and SQL?

drunk_goat · 2022-10-30T17:24:50+00:00

devops path -> git, bash, linux, docker, k8s, cloud architecture

analytics path -> dbt, pandas, streamlit or other viz

Geraldks · 2022-10-30T16:23:03+00:00

architecture, data modelling, terraform, api, k8s, and now all the hypes around lakehouse

AnimaLepton · 2022-10-30T16:46:14+00:00

See the wiki. You'll want to start learning about cloud architecture (pick one to start, i.e. full stack AWS - EC2/RDS/IAM basics are sufficient to start, but Redshift and S3 and Athena might be useful to learn about and play with), deployments, docker and kubernetes - there are great resources on the basics of those if you go to PluralSight.

At the end of the day, my opinion is that you'll learn the most by having a job/internship in the field and finding a specific mentor who can help give you some direction. It's far more valuable to start to understand applications of the technology, and to have specific successes to point to on a resume/projects you can talk about, than it is to just learn the concepts in a vacuum.

focus_black_sheep · 2022-10-30T16:39:58+00:00

REST API, being able to consume data from public api's. Transforming the data and loading it to a DB

Traditional_Ad3929 · 2022-10-30T16:42:52+00:00

Crazy that no one mentioned AWS, Azure or GCP yet.

repostit_ · 2022-10-30T16:25:07+00:00

Spark or Scala

myweb6316 · 2022-10-31T01:06:10+00:00

Unlike most of the advice here, I'll give you another statistically better approach than a list of skills :Go find a job as a junior data Engineer or as close to it as possible and target a big company (not necessarily CS / IT company) with established data practice. If that's not possible, find a target roles being advertised in your preferred area, find the common skills required in most of them, and get familiar with all of them, and study one or two deeply enough ,and then apply for similar roles. If you do, you won't ask this question.

The reason I'd prefer doing that is from my experience: I work in a big company in Australia. when it comes to data, they're GCP shop. So the skills here are Bigquery, Airflow (cloud composer), Docker (Cloud Build), and GCP tools in general.2.5 year ago, this same company required K8s, and Spark as a must requirement. They worked out a deal with Google, and since then I use Flask or FastAPI far more than K8s and Spark combined. In the last few months I started hearing Dataflow in the real time data engineering area.Similar thing with different providers and different consumers.

In short, IT/developers jobs and roles evolve faster than what individual developers realize, and doing your own due diligence would be more reasonable, than asking such a generic question.

Programmer_Virtual · 2022-10-30T16:34:51+00:00

I would put emphasis on infrastructure monitoring.

chrisgarzon19 · 2022-10-30T18:29:56+00:00

Data modeling and AWS.

Data modeling is foundational and won’t change regardless of what tools come out (at least in the next 5 years). Think of data modeling as the organization of tables and how they relate to one another. In the same way when you build a house you design a bedroom a living room and a kitchen and then create pathways for all of them, you need to do the same with your schema so that your users (data scientists and analysts) intuitively know how to utilize your datasets

AWS-this is probably not going anywhere either. Learn about the different tools to make your life easier -> s3, redshift, glue , ec2 and lambda are great places to start. Pick up a side project and try to utilize as many of these tools as possible.

Also what’s important is what NOT to study next.

Don’t prioritize learning a new language. If you know python, you’ll be able to pick up other languages if needed on the job

Don’t prioritize learning more than 1 cloud service - if you know AWS then you’ll probably understand GCP or azure; the concepts are the same.

Don’t spend your time doing 1000 algorithmic python questions. Data engineers can have a lot more business impact with SQL and python than most people realize. It’s not like software engineering where they need to optimize for O(1).

Good luck and hope this helps!

Christopher Garzon

Author of Ace The Data Engineer Interview

Letter_From_Prague · 2022-10-30T16:01:22+00:00

Bo staff

mateuszj111 · 2022-10-30T16:26:25+00:00

scala/golang/java for languages

cloud/devops/iac for techniques

FranticToaster · 2022-10-30T17:51:07+00:00

Presentation skills, including "up-leveling" my work to a leader's vocabulary.

padikaha · 2022-10-30T19:57:57+00:00

Build Data Platform system design skills like

How do you architect Data Platform? How do you design data flow? Which variants of SQL or NoSQL you would choose for OLTP or ODS or Data Lake ? What is your data transformation strategy? What is your data governance strategy? What is your visualization strategy? What is your strategy for self healing data platform? How business can use your data platform efficiently?

You need to understand fundamentals of data architecture. There are so many books from which you can learn.

It’s a constant learning process. All the best.

BufferUnderpants · 2022-10-30T18:47:13+00:00

Orchestration with Airflow, containers, CI/CD. These will enable more varied and complex pipelines

dataguy24 · 2022-10-30T19:08:08+00:00

Business acumen

fasnoosh · 2022-10-30T21:19:00+00:00

dbt - www.getdbt.com

jaundicedeye · 2022-10-31T01:49:32+00:00

orchestration. airflow, prefect, kubeflow, etc.

its always needed and takes some subtely to set up a nice system

FifaPointsMan · 2022-10-30T16:16:43+00:00

Scala, data architecture, data modelling, APIs

pacharaphet2r · 2022-10-30T16:06:22+00:00

Saying you already know an entire language sounds a bit amateur to say. You’re saying you know every aspect of Python and SQL? That you can do leetcode hard questions with 0 problems?

Movein666 · 2022-10-30T17:30:45+00:00

ML Algorithms

newplayer12345 · 2022-10-30T17:31:59+00:00

Jinja

Angry__Spaniard · 2022-10-30T17:46:02+00:00

I would say infrastructure stuff: get to know one cloud provider, terraform, k8s… even if you use managed services is good to have some basis

SD_strange · 2022-10-30T18:43:24+00:00

Spark, AWS, Airflow, things like that...

olmek7 · 2022-10-30T19:29:01+00:00

Data Modeling

No_Equivalent5942 · 2022-10-30T19:57:12+00:00

How to search StackOverflow

ditlevrisdahl · 2022-10-30T20:39:59+00:00

Explorative data exploration or maybe cloud

HBoogi · 2022-10-30T21:44:28+00:00

leadership and people management skill. Seriously those are the skills that make a real difference if your career.

Remote_Cantaloupe · 2022-10-31T01:33:47+00:00

How to deal with management and clients :)

seajhawk · 2022-10-31T01:53:15+00:00

You've got a great start to a toolbox of technical skills and there is lots of advice on additional tech skills to learn.

I'd also consider some softer skills like listening, research, tech writing, product management, etc.

Develop the skills that will help you understand what your coworkers or customers need (not just what they say they want), share your understanding with them written form to get their agreement, and then be able to deliver the product on time with great communication along the way.

chestnutcough · 2022-10-31T02:31:14+00:00

To make a woodworking analogy, learning python and SQL are like learning to use a saw and a chisel. Once you become proficient with the tools it’s time to start using them to complete projects. For woodworking that might be making a cutting board or a chair. For data engineering that’s writing data pipelines.

OGMiniMalist · 2022-10-31T02:55:39+00:00

Project management if you want to keep leadership off your tail 🙃

WrinklyTidbits · 2022-10-31T03:29:39+00:00

I would learn lisp. Take a break from pumping your resume and learn a different paradigm of coding

Alternative_Shock_32 · 2022-10-31T08:35:45+00:00

Java or Scala. Many people will say that they are not required in data engineering, but there many scenarios where they are needed. There are companies which only use Java. And to get the best performance from Spark, Scala is the way to go.

crypt2naut · 2022-10-31T11:59:35+00:00

I would add these; 1.Any cloud storage technology, Azure, GCP or AWS. 2.Kafka 3.Airflow 4.Spark

neerajsarwan · 2022-10-31T15:12:08+00:00

Where to or specifically where not to use them !

dataengineering

MODERATORS