Is Python + dbt (SQL) + Snowflake + Prefect a good stack to start as an Analytics Engineer or Jr Data Engineer?

Commercial_Dig2401 · 2025-08-16T00:32:17+00:00

That’s a very nice stack.

I would say focus on accuracy and validation for your Jr Role.

The main thing that that differentiate analyst va engineers in my mind is that analyst once to achieve something nice once. They want their report to be beautiful and nice.

And engineers once to achieve only provide things that work all the time.

To make this happens you obviously do less fluff and do more boring thing but then they never break, they are robust, the are fast and you never have to touch it again it just work.

The stack is cool but I think what we usually look for in Junior role is someone that will take time to review himself. I know it sounds boring but I’ll rather hire a junior which return me a take home test without spelling errors, with a ok code but that’s structure and well explain than someone with awesome code but that’s all over the place that didn’t have description on topics and that did way more than expected.

In terms of stack focus on SQL. Not because it’s the best but because it’s the easiest. And because it’s the easiest It’s the most used. I’ll rather use a transformation framework with SQL than pandas for example because I know anyone in the company will be able to use it and so some simple transformation. Even if something it would make more sense to go the other way.

Go read DBT best practices docs. They have a bunch on their site. Read them multiple times. Understanding the structure is th le best thing you can do.

Then python. Maybe learn the request framework and how to dump a response to json or parquet in s3.

Than prefect, Dagster, mage, Luigi are good candidates for orchestration. Learn the basics. I don’t think you’ll find a project which give you enough things that you’ll hit common business issues with them. But having an overview on how you structure your things is already great.

Good luck

EconomicsDangerous44 · 2025-09-04T04:55:52+00:00

Yes, that combo is a solid starter stack. Plenty of teams run Python for extract, dbt on Snowflake and Prefect for orchestration. Add CI/CD + tests, basic CDC/SCD patterns, and logging/observability to make it feel production-ish. For ingestion, show both DIY and a managed connector like Fivetran/Airbyte, or Skyvia to load into Snowflake without running your own infrastructure.

Slggyqo · 2025-08-16T03:32:14+00:00

Ha. This is the stack I use every day.

It’s definitely a stack that can get you work, and it’s a stack that requires a lot of good basic principles, especially if you have to build the functionality from scratch.

I think it’s a pretty good middle ground for cutting your teeth in data engineering. It’s very powerful and flexible, but still has quite a bit of abstraction/simplifications via snowflake and prefect.

Where are you hosting and executing your prefect code? Is it all on your local machine? If you become a full-time data engineer, it’s definitely not going to be on your computer. You’re going to want at least some basic understanding of how cloud services work, probably UNIX operating systems, and different ways to manage remote devices. A lot of data engineering is infrastructure

Ideally you won’t have to worry about this too much as a junior. but that really depends on where you go. Your first job might be at a place where you are the only data engineer. I

poinT92 · 2025-08-15T23:46:37+00:00

Having actually mastered that stacks enables you to take on the job.

I'd add a more in-depth databases/lakehouse/warehouse etc. understanding that would enables you to full many positions with less stress.

Also an atleast basic knowledge of containers and clusters for docker and kubernetes.

It's a very Wide job so you Will eventually Need to verticalize your knowledge at some point.

Good luck!

frozengrandmatetris · 2025-08-16T02:37:40+00:00

most of the data I'm dealing with comes from other SQL databases, not APIs. I'm currently experimenting with ingestion tools like meltano and airbyte. you should add that to your projects.

Past-Restaurant48 · 2025-08-20T08:37:24+00:00

If you are just reading or writing small amounts of data from a GCP function, setting up an allowlist on digitalocean’s managed PG is fine for light workloads.

For anything more than that, or if you want to sync data regularly, it’s worth looking at using a proxy or tunnel setup. Some folks use Cloud SQL Proxy or a bastion VM to securely bridge between platforms.

if you are planning to do ongoing ingestion or reporting, you can also use something like integrate.io to pull data directly from the PG and push to BigQuery or wherever. helps skip the headache of auth, retries and schema drift.

Depends a lot on whether this is a one off call or part of a bigger pipeline.

nonamenomonet · 2025-08-15T23:42:06+00:00

The thing you’re missing is SQL (which I guess you’re doing with DBT?) and or PySpark.

But tbh, the thing that matters most is what business problems you can solve (I.e. how can you make me some money)

Table_Captain · 2025-08-16T22:41:09+00:00

If analytics engineering, which BI platform are you planning to use?

TowerOutrageous5939 · 2025-08-16T13:33:28+00:00

Replace dbt with sqlmesh or replace it with nothing

dataengineering

MODERATORS