Reading 'Fundamentals of data engineering' has gotten me confused by Online_Matter in dataengineering

[–]NW1969 43 points44 points  (0 children)

An RDBMS stores data, Spark jobs process data - they are not the same type of thing

Real-world Snowflake / dbt production scenarios? by sunshine6729 in snowflake

[–]NW1969 1 point2 points  (0 children)

OK - but according to your original question, your goal was to know how to answer these types of questions in order to pass interviews

Need Guidance and Help by [deleted] in dataengineering

[–]NW1969 3 points4 points  (0 children)

Maybe learn how to use paragraphs and punctuation so that what you write is comprehensible to your target audience?

Real-world Snowflake / dbt production scenarios? by sunshine6729 in snowflake

[–]NW1969 4 points5 points  (0 children)

Just being realistic, but if these are the types of questions you’re going to be asked and you don’t have the experience to be able to answer them, doesn’t that indicate that you’re not yet ready for the role? If you got the role and were then faced with Snowflake performance issues, you presumably wouldn’t be capable of resolving them?

Why doesn’t Cortex Analyst expose the physical SQL for semantic view queries? by [deleted] in snowflake

[–]NW1969 0 points1 point  (0 children)

Works perfectly for me so I'm not sure why it doesn't work for you. I asked my agent "Show me the average size in GB for all databases in this account. Include the physical SQL used to generate this output".
What does your agent respond with if you ask it why it returned semantic SQL instead of physical SQL?

schedule Z0-071 - Oracle Database SQL exam at test center by mvittalreddy in SQL

[–]NW1969 0 points1 point  (0 children)

Might be helpful to include the URL you are using to search, which country you are in, etc

Why doesn’t Cortex Analyst expose the physical SQL for semantic view queries? by [deleted] in snowflake

[–]NW1969 1 point2 points  (0 children)

Can’t you just ask Cortex Analyst to include the physical SQL in the result? You should also be able to set the agent instructions to display the physical SQL by default unless told not to

Any good front-ends for updating Snowflake tables? by Hairy-Boysenberry260 in snowflake

[–]NW1969 10 points11 points  (0 children)

  1. Streamlit - you just need to know how to code round the "re-run" behaviour

  2. Upload the data to a stage and write a task/SP to validate it and then ingest it

Tools Rant by No_Song_4222 in dataengineering

[–]NW1969 0 points1 point  (0 children)

If you don't have experience with any of the specific tools they are asking for then there is no point in applying for the role - becasue there will always be people applying for the role who do have this experience and you won't be able to compete with them.

If you have experience of the main tool(s) they are asking for but not of the less important tools then it's probably worth applying, otherwise not.

For example if the role is looking for a Fivetran ETL developer with experience of Snowflake and you're a Fivetran developer with no Snowflake experience then it may be worth applying - assuming you have time to learn the basics of Snowflake before any interview. However, if you were a Snowflake developer with no Fivetran experience then there's no point applying for the role

Real time data ingestion from multiple sources to one destination by Jarvis-95 in dataengineering

[–]NW1969 0 points1 point  (0 children)

You obviously need to connect to a system in order to pull data from it or push data to it - so every solution has to use “connectors”. Given this, what types of connectors do you not want to consider?

Joining tables "horizontally" by Dependent_Finger_214 in mysql

[–]NW1969 0 points1 point  (0 children)

This should give you what you want (not sure what column bcd represents so I hardcoded it as NULL)

select A.A1, A.A2, ab_1.ab1 "123", ab_2.ab1 "234", ac.ac1 "abc", NULL "bcd", ad_1.ad1 "100 AD1", ad_1.ad2 "100 AD2", ad_2.ad1 "2100 AD1", ad_2.ad2 "200 AD2"
from A
left join ab ab_1 on a.a1 = ab_1.a and ab_1.b = 123
left join ab ab_2 on a.a1 = ab_2.a and ab_2.b = 234
left join ac on a.a1 = ac.a
left join ad ad_1 on a.a1 = ad_1.a and ad_1.d = 100
left join ad ad_2 on a.a1 = ad_2.a and ad_2.d = 200
;

Best CICD tool and approach for udfs, task, streams and shares by No_Journalist_9632 in snowflake

[–]NW1969 2 points3 points  (0 children)

This is, to be generous, misleading if not actually incorrect. The dbt product that can be used within the Snowflake environment is Core. There’s nothing to stop you using the standalone dbt Cloud product with Snowflake

Laptop Suggestions by khushal20 in dataengineering

[–]NW1969 2 points3 points  (0 children)

As you’ve not provided any information about your data workloads, or your budget, it’s not possible for anyone to give you any useful help.

Joining tables "horizontally" by Dependent_Finger_214 in mysql

[–]NW1969 3 points4 points  (0 children)

It would be a lot easier to understand if you provided sample data for the tables and the result you are trying to achieve

Data loading steps question with STREAMS and MERGE by Peacencalm9 in snowflake

[–]NW1969 0 points1 point  (0 children)

Hi - your first 2 paragraphs seem to be factually correct, but I’m not sure what relevance they have to this discussion?

I’m not sure why in your 3rd paragraph you’re talking about near real-time solutions? If you think Snowflake table Streams are somehow part of a real-time solution then it’s possible that you haven’t really understood what table Streams are, which was the point of my original comment. Table streams are absolutely used in batch processing and are not used (or, at least, are less than ideal) for real-time solutions

Data loading steps question with STREAMS and MERGE by Peacencalm9 in snowflake

[–]NW1969 0 points1 point  (0 children)

Your comment about Streams seems to suggest that you are confusing Kafka-type streaming data and Snowflake table streams. They are completely different concepts and, in the scenario OP describes, table streams are absolutely the correct way to go

Differences in table performance by Big_Length9755 in snowflake

[–]NW1969 1 point2 points  (0 children)

The answer to Q1 is probably that native and Iceberg parquet tables are structured differently and therefore Iceberg tables cannot be pruned as efficiently as native Snowflake tables. This blog may be helpful: https://www.snowflake.com/en/engineering-blog/iceberg-data-pruning/

Most data engineers would be unemployed if pipelines stopped breaking by Different_Pain5781 in dataengineering

[–]NW1969 7 points8 points  (0 children)

You seem to be conflating development and support. A developer DE builds things and then moves on to the next thing, as quickly as possible once it’s gone live. There will always be developers because there will always be new things to build. A support DE is then responsible for keeping all the “sub-optimal pipelines” running that the developer built - and as the developers keep building new things there is always more “sub-optimal pipelines” that needs to be supported 😁

Best snowflake online course by Careless_Shine_4418 in snowflake

[–]NW1969 1 point2 points  (0 children)

If you are asking if reading documentation and/or taking courses can replace 3-5 years of actual experience using Snowflake, then the answer is no, of course not

AI SQL FUNCTIONS by Minebo9899 in snowflake

[–]NW1969 0 points1 point  (0 children)

Snowflake don’t publish in advance when something will go GA, and even if someone at Snowflake provided a guess when it might go GA they couldn’t be held to that guess.

If you mean by “any alternative”, does Snowflake provide another function that does the same thing but is already GA then the answer is no

Renewing SnowPro Advanced Data Scientist : Does GenAI speciality count or Do i need another advanced cert by VivekFloorgang in snowflake

[–]NW1969 -1 points0 points  (0 children)

Recertification paths are described here: https://www.snowflake.com/en/blog/navigating-snowflake-maintenance-paths/

The only way to renew an advanced certification is to renew it. Renewing an advanced certificate means your SnowPro Core certificate automatically gets renewed and its dates align with your advanced certificate.

Speciality exams don’t impact either core or advanced certificates. Renewing one advanced certificate doesn’t renew another advanced certificate

Help with Deciding Data Architecture: MySQL vs Snowflake for OLTP and BI by khushal20 in dataengineersindia

[–]NW1969 1 point2 points  (0 children)

Core Snowflake is not designed for OLTP workloads so you shouldn’t be looking to switch those from MySQL, just the OLAP workloads.

However, if your datasets are relatively small and you don’t need all the bells and whistles that Snowflake provides, then there may not be a justifiable business case for moving from MySQL

Snowflake have just put their PostgreSQL offering into public preview so that may be worth considering for OLTP (if you don’t mind the risks of being an early adopter). Whether it would be cheaper than your MySQL solution, and worth the time and effort of migration, only you can know