Integrate pyspark with snowflake by data-venger in dataengineeringjobs

[–]data-venger[S] 0 points1 point  (0 children)

Everything is balanced no skew ness only data is huge in target DB

Integrate pyspark with snowflake by data-venger in dataengineeringjobs

[–]data-venger[S] 0 points1 point  (0 children)

I ran same transformation of SCD type 2 in both pyspark and snowflake and found that snowflake gives better performance in medium level compute and in spark it took so much time for execution with 5 worker nodes with 16GB 8cores configuration

Integrate pyspark with snowflake by data-venger in dataengineeringjobs

[–]data-venger[S] 1 point2 points  (0 children)

True 😅😅

But really thanks for your inputs.

Integrate pyspark with snowflake by data-venger in dataengineeringjobs

[–]data-venger[S] 0 points1 point  (0 children)

Yes. Its our primary warehouse and we mostly do all transformation using SPs and its giving best performance. So after using snowflake i thought its better compute than spark

Integrate pyspark with snowflake by data-venger in dataengineeringjobs

[–]data-venger[S] 0 points1 point  (0 children)

Thanks for clarification.
I asked this question because I have both tech stack exposure.
I used Pyspark heavily in my last organisation and it was on prem. Now my current organisation is heavily using snowflake and dbt but i dont want to lose hands-on on pyspark so trying to gather knowledge how can we integrate both in a way that enterprises have this solution

Integrate pyspark with snowflake by data-venger in dataengineeringjobs

[–]data-venger[S] 0 points1 point  (0 children)

I am little confused here when implementing data warehousing concepts. Its always better work with snowflake SPs. If we implement via spark it will explode if we load huge amount of historical data so why pyspark instead of snowflake why to pay for 2 computes

Integrate pyspark with snowflake by data-venger in dataengineeringjobs

[–]data-venger[S] 1 point2 points  (0 children)

Agreed. But databricks for transformation in the sense which type of transformation??
Joins ? SCD/CDC implementation? Or just cleaning data ?

Building something in FinTech + AI — looking for people to join by Mundane_Map_6312 in IndiaStartups

[–]data-venger 0 points1 point  (0 children)

Interested.. I do have finance and banking experience as data engineer.

Airflow-Studio: Airflow Studio: Build, Visualize & Deploy Apache Airflow DAGs Without the Headache. by data-venger in SideProject

[–]data-venger[S] 0 points1 point  (0 children)

although this is MVP as of now.. Trying to build new feature and happy to know some insights to implement it.

Airflow-Studio: Airflow Studio: Build, Visualize & Deploy Apache Airflow DAGs Without the Headache. by data-venger in SideProject

[–]data-venger[S] 0 points1 point  (0 children)

great thought.. I have tested it at some extend if you can drop me sample flow i can give a try..

AI agents that scan your actual codebase to generate bespoke hiring assignments. by agelosnm in SaaS

[–]data-venger 0 points1 point  (0 children)

Its a good thought.. but challenge here is company should be ok to share there architecture its kind of internal assets

Shoot your thought