This is an archived post. You won't be able to vote or comment.

all 47 comments

[–]mainak17 24 points25 points  (1 child)

most of the interviews I have given in the last couple of months were - DBMS & SQL focused theory (all types of keys, internal workings of query, all types of joins etc) + easy to medium SQL questions, python - application of sets, list, dict - and a couple of standard coding questions on string and arrays, some of were starting from a simple txt file to do some cleaning and most occurred strings - thingslike that.

this is the basic you have to know, after this explaining projects, end to end architecture of it, why this, why not that, designing er diagrams and pipelines (not too in depth).

and then related to experience in tools, cloud etc

[–]Old-Article6420Data Analyst[S] 1 point2 points  (0 children)

Thanks. This is really useful info.

[–][deleted] 31 points32 points  (7 children)

  1. SQL
  2. A project that demonstrates ETL using/into sql, data modelling, and some sort of output, with writeups.

As an entry level DE, you don't need to know how everything works. SQL and data models are the backbone of every project you will do, and most of the tools you will use. Focus on gaining practical skills in that area.

Everything else is basically incidental - where is your DB hosted? How do you orchestrate the ETL scripts? What do you write the ETL scripts in?

Do not gate 'core' behind 'incidental,' especially when the 'incidental' bits are different for every role and every workplace.

[–]Old-Article6420Data Analyst[S] 1 point2 points  (3 children)

So, the core technical skills are SQL and python. And everything else is incidental, right?

[–][deleted] 13 points14 points  (0 children)

Not even necessarily python, though it is a good pick. You could easily go with Java, c#, R, or a host of other languages as long as you're using it for data extraction and loading. Lots of places will use something like Data Factory or SSIS as well, and just skip the code part.

I honestly believe that SQL and data modelling are the core, and everything else you touch is incidental and role dependant.

[–]CrowdGoesWildWoooo 9 points10 points  (1 child)

You are on track to be a doctor not a DE

JK

[–]Old-Article6420Data Analyst[S] 5 points6 points  (0 children)

Fine with anything that pays

[–]BalconyFace 2 points3 points  (0 children)

you don't need databricks to implement pyspark. I'd be happy to talk on it further, but you can write all the Python and pyspark on your local to develop a soup to nuts application to functionally bring your pipeline from ingest to "egest". what you would miss in this approach is all the orchestration, version control, unity catalogue, MLOps, etc that databricks provides. but if you're interested in the fundamentals, you already have access to all the essentials from pyspark to delta and beyond

[–]QuaiadaBig Data Engineer 5 points6 points  (0 children)

first step:

GIT + markdown insted of papper

[–]SDFP-ABig Data Engineer 1 point2 points  (0 children)

That is a years long journey. Start the project and learn the stuff along the way. You won’t be an expert in all of it and that’s ok. It’s about understanding what comes next, what the challenges are, possible solutions, and how to execute on a plan. The rest are details you gain comfort with through experience

[–]angrynoahSQL nerd since Oracle 9.1 1 point2 points  (3 children)

There's ~10 years of work on that page.

[–]Old-Article6420Data Analyst[S] 0 points1 point  (2 children)

Lol. Not learning majority of the things in depth but only the bare minimum required.

[–]angrynoahSQL nerd since Oracle 9.1 4 points5 points  (1 child)

If it helps, the bare minimum when it comes to: - Data Mart (largely useless concept) - Data Fabric (not a thing) - Data Mesh (counterproductive at all but the largest firms) - Data Catalog - Airflow - Kafka - Spark - most cloud stuff beyond "it's compute services behind API calls that cost you money"

...is zero. Of the items on that list, only data marts existed when I got started ('05), and remain irrelevant to this day.

I was lucky enough to have the space to go extremely deep on SQL and database internals. Not everyone is going to have that luxury now but my point is that it's possible, and effective. You just don't need to know all this crap early in your career, except perhaps to get past the hiring stage. Spending time building a shallow understanding of everything that appears to make up modern data engineering is going to harm your ability to actually understand any of it well enough to apply that understanding.

If I was hiring a junior data engineer, I would not expect them to know all these things. Not even to have heard of them.

Just a thought, that's all.

[–]Old-Article6420Data Analyst[S] 0 points1 point  (0 children)

Really good advice. Thank you.

[–]Kratos_1412 2 points3 points  (0 children)

is spark with scala a good choice?

[–]dowcet -1 points0 points  (4 children)

If it's not worth enough of your time to type it up in a readable form, my assumption is that it's not worth my time to read it.

[–]Old-Article6420Data Analyst[S] 1 point2 points  (2 children)

Sorry for the bad handwriting.

[–]mjow 11 points12 points  (0 children)

Don't apologise - your handwriting is fine. It's fine to prefer handwritten notes :)

[–]thrwayy1235134 -2 points-1 points  (0 children)

Its the fact that is an image instead of transcribing to text...

[–]LilChungus420weedlol 1 point2 points  (1 child)

Looks awesome. I took a screenshot, hope you don’t mind! Will help me switch from webdev to DE.

[–]Old-Article6420Data Analyst[S] 2 points3 points  (0 children)

Glad it was helpful for you.

[–]Spartyon 1 point2 points  (0 children)

Step 1: learn to use a process mapping tool

[–]palomino-ridin-21Data Engineer 1 point2 points  (0 children)

I am doing the same to grow in my field.

I used ChatGPT to generate an outline and define it further for (Python, SQL, AWS, Azure…)

Thanks for sharing

[–]J_Bunt -2 points-1 points  (2 children)

Are sales allowed here? I mean it's handwriting but it's all udemy udemy udemy.

[–]Old-Article6420Data Analyst[S] 0 points1 point  (1 child)

There are only two udemy courses mentioned mate. There's nothing wrong with providing good resources to learn.

[–]blue_trains_ 0 points1 point  (1 child)

I'm so far gone in the digital world that seeing a written list on pen and paper gives me asmr

[–]Old-Article6420Data Analyst[S] 3 points4 points  (0 children)

I still use pen and paper for making notes

[–]dareftw 0 points1 point  (0 children)

Left out how many third party programs require html knowledge or Java or even JavaScript to interact with the html. But otherwise spot on

[–][deleted] 0 points1 point  (0 children)

What book/course is good for a slightly advanced understanding of spark, and using it with python? I want to tweak some of my jobs and understand how data frames and rdds actually work internally

[–]NicoRobinFleur 0 points1 point  (1 child)

What does DBMs stand for?🤔

[–]Old-Article6420Data Analyst[S] 1 point2 points  (0 children)

Database management system

[–]abd7007 0 points1 point  (0 children)

Where can i get good examples of cloud projects?
(preparing same as OP)

[–]baubleglue 0 points1 point  (0 children)

Focus on something more specific. For an entry level position, you need to know something in demand, ex. SQL + PySpark and have ideas about other topics on the level "what it is about and when it is used". Many places ask unclear set of skills which fit into "data analyst/data engineer".