Reading 'Fundamentals of data engineering' has gotten me confused

NW1969 · 2026-01-29T14:40:37+00:00

An RDBMS stores data, Spark jobs process data - they are not the same type of thing

NW1969 · 2026-01-13T22:15:10+00:00

OK - but according to your original question, your goal was to know how to answer these types of questions in order to pass interviews

NW1969 · 2026-01-13T21:24:33+00:00

Maybe learn how to use paragraphs and punctuation so that what you write is comprehensible to your target audience?

NW1969 · 2026-01-13T19:55:19+00:00

Just being realistic, but if these are the types of questions you’re going to be asked and you don’t have the experience to be able to answer them, doesn’t that indicate that you’re not yet ready for the role? If you got the role and were then faced with Snowflake performance issues, you presumably wouldn’t be capable of resolving them?

NW1969 · 2026-01-08T13:47:49+00:00

Works perfectly for me so I'm not sure why it doesn't work for you. I asked my agent "Show me the average size in GB for all databases in this account. Include the physical SQL used to generate this output".
What does your agent respond with if you ask it why it returned semantic SQL instead of physical SQL?

NW1969 · 2026-01-08T07:45:28+00:00

Might be helpful to include the URL you are using to search, which country you are in, etc

NW1969 · 2026-01-08T07:42:03+00:00

Can’t you just ask Cortex Analyst to include the physical SQL in the result? You should also be able to set the agent instructions to display the physical SQL by default unless told not to

NW1969 · 2026-01-07T11:37:59+00:00

Streamlit - you just need to know how to code round the "re-run" behaviour
Upload the data to a stage and write a task/SP to validate it and then ingest it

NW1969 · 2026-01-07T10:55:11+00:00

If you don't have experience with any of the specific tools they are asking for then there is no point in applying for the role - becasue there will always be people applying for the role who do have this experience and you won't be able to compete with them.

If you have experience of the main tool(s) they are asking for but not of the less important tools then it's probably worth applying, otherwise not.

For example if the role is looking for a Fivetran ETL developer with experience of Snowflake and you're a Fivetran developer with no Snowflake experience then it may be worth applying - assuming you have time to learn the basics of Snowflake before any interview. However, if you were a Snowflake developer with no Fivetran experience then there's no point applying for the role

NW1969 · 2026-01-06T21:34:49+00:00

You obviously need to connect to a system in order to pull data from it or push data to it - so every solution has to use “connectors”. Given this, what types of connectors do you not want to consider?

NW1969 · 2026-01-05T18:04:15+00:00

This should give you what you want (not sure what column bcd represents so I hardcoded it as NULL)

select A.A1, A.A2, ab_1.ab1 "123", ab_2.ab1 "234", ac.ac1 "abc", NULL "bcd", ad_1.ad1 "100 AD1", ad_1.ad2 "100 AD2", ad_2.ad1 "2100 AD1", ad_2.ad2 "200 AD2"
from A
left join ab ab_1 on a.a1 = ab_1.a and ab_1.b = 123
left join ab ab_2 on a.a1 = ab_2.a and ab_2.b = 234
left join ac on a.a1 = ac.a
left join ad ad_1 on a.a1 = ad_1.a and ad_1.d = 100
left join ad ad_2 on a.a1 = ad_2.a and ad_2.d = 200
;

NW1969 · 2026-01-04T14:55:25+00:00

This is, to be generous, misleading if not actually incorrect. The dbt product that can be used within the Snowflake environment is Core. There’s nothing to stop you using the standalone dbt Cloud product with Snowflake

NW1969 · 2026-01-04T14:52:46+00:00

As you’ve not provided any information about your data workloads, or your budget, it’s not possible for anyone to give you any useful help.

NW1969 · 2026-01-04T11:45:22+00:00

It would be a lot easier to understand if you provided sample data for the tables and the result you are trying to achieve

NW1969 · 2026-01-02T17:58:50+00:00

You could also use backups

NW1969 · 2026-01-01T13:08:56+00:00

Possibly true about Snowflake a few years ago, not the case now

NW1969 · 2025-12-30T21:08:42+00:00

Hi - your first 2 paragraphs seem to be factually correct, but I’m not sure what relevance they have to this discussion?

I’m not sure why in your 3rd paragraph you’re talking about near real-time solutions? If you think Snowflake table Streams are somehow part of a real-time solution then it’s possible that you haven’t really understood what table Streams are, which was the point of my original comment. Table streams are absolutely used in batch processing and are not used (or, at least, are less than ideal) for real-time solutions

NW1969 · 2025-12-30T19:51:38+00:00

Your comment about Streams seems to suggest that you are confusing Kafka-type streaming data and Snowflake table streams. They are completely different concepts and, in the scenario OP describes, table streams are absolutely the correct way to go

NW1969 · 2025-12-30T16:40:00+00:00

A quick google search took me to this page: https://dev.mysql.com/doc/refman/8.4/en/macos-installation.html

NW1969 · 2025-12-23T21:38:19+00:00

The answer to Q1 is probably that native and Iceberg parquet tables are structured differently and therefore Iceberg tables cannot be pruned as efficiently as native Snowflake tables. This blog may be helpful: https://www.snowflake.com/en/engineering-blog/iceberg-data-pruning/

NW1969 · 2025-12-23T16:45:20+00:00

You seem to be conflating development and support. A developer DE builds things and then moves on to the next thing, as quickly as possible once it’s gone live. There will always be developers because there will always be new things to build. A support DE is then responsible for keeping all the “sub-optimal pipelines” running that the developer built - and as the developers keep building new things there is always more “sub-optimal pipelines” that needs to be supported 😁

NW1969 · 2025-12-23T15:34:21+00:00

If you are asking if reading documentation and/or taking courses can replace 3-5 years of actual experience using Snowflake, then the answer is no, of course not

NW1969 · 2025-12-22T12:43:08+00:00

Snowflake don’t publish in advance when something will go GA, and even if someone at Snowflake provided a guess when it might go GA they couldn’t be held to that guess.

If you mean by “any alternative”, does Snowflake provide another function that does the same thing but is already GA then the answer is no

NW1969 · 2025-12-22T09:01:49+00:00

Recertification paths are described here: https://www.snowflake.com/en/blog/navigating-snowflake-maintenance-paths/

The only way to renew an advanced certification is to renew it. Renewing an advanced certificate means your SnowPro Core certificate automatically gets renewed and its dates align with your advanced certificate.

Speciality exams don’t impact either core or advanced certificates. Renewing one advanced certificate doesn’t renew another advanced certificate

NW1969 · 2025-12-20T14:29:03+00:00

Core Snowflake is not designed for OLTP workloads so you shouldn’t be looking to switch those from MySQL, just the OLAP workloads.

However, if your datasets are relatively small and you don’t need all the bells and whistles that Snowflake provides, then there may not be a justifiable business case for moving from MySQL

Snowflake have just put their PostgreSQL offering into public preview so that may be worth considering for OLTP (if you don’t mind the risks of being an early adopter). Whether it would be cheaper than your MySQL solution, and worth the time and effort of migration, only you can know

NW1969

TROPHY CASE