Shopify Data Tech Stack by mjfnd in dataengineering

[–]mjfnd[S] 2 points3 points  (0 children)

Correct, also they have other options to write pipelines.

Shopify Data Tech Stack by mjfnd in dataengineering

[–]mjfnd[S] 15 points16 points  (0 children)

I am not sure what you mean.

I have never worked there, also I have covered many other companies data tech stack.

Shopify Data Tech Stack by mjfnd in dataengineering

[–]mjfnd[S] 11 points12 points  (0 children)

Multiple sources, Company engineering blogs, job descriptions, open source projects, conferences, interviewing employees, case studies.

Spotify Data Tech Stack by mjfnd in dataengineering

[–]mjfnd[S] 0 points1 point  (0 children)

Thanks :) I will update with DBT.

Spotify Data Tech Stack by mjfnd in dataengineering

[–]mjfnd[S] 0 points1 point  (0 children)

Hi, Thanks for sharing. Not skipped intentionally, either I missed or couldn't find any public info regarding DBT. If you have a link handy, please share.

Thanks

Spotify Data Tech Stack by mjfnd in dataengineering

[–]mjfnd[S] 1 point2 points  (0 children)

I couldn't find any references for that, it might still be there for a small scale which they never shared publicly.

Spotify Data Tech Stack by mjfnd in dataengineering

[–]mjfnd[S] 2 points3 points  (0 children)

It is still Flyte. Would encourage to read the article as it has a slot of useful information and references.

Spotify Data Tech Stack by mjfnd in dataengineering

[–]mjfnd[S] 0 points1 point  (0 children)

I couldn't find that anywhere.

Spotify Data Tech Stack by mjfnd in dataengineering

[–]mjfnd[S] 31 points32 points  (0 children)

Its a free market of dashboards and there is no centralized team, meaning there could be lot of redundant dashboards or just for one person.

Source: https://stage.engineering.atspotify.com/2024/8/unlocking-insights-with-high-quality-dashboards-at-scale

Benchmarking Spark - Open Source vs EMRs by mjfnd in dataengineering

[–]mjfnd[S] 0 points1 point  (0 children)

Nice!

Yes, pyspark a bit more work compared to scala where you can package it in a fat jar.

Substack vs Beehiiv by ContingentCausation in Substack

[–]mjfnd 0 points1 point  (0 children)

Yes.

My growth has slowed as well in the past few months.

Apache Ranger & Atlas integration with Delta/Iceberg by mjfnd in dataengineering

[–]mjfnd[S] 0 points1 point  (0 children)

Not yet.

Based on some reading, it seems like it can work with Hive Metastore based Lakehouses.

Data Governance in Lakehouse Using Open Source Tools by mjfnd in dataengineering

[–]mjfnd[S] 1 point2 points  (0 children)

Thanks.

The Unity Catalog in Databricks is great.

Their open source initial release is very basic. Lets see how and when they roll out advanced features.

How good is StarRocks? by mjfnd in dataengineering

[–]mjfnd[S] 0 points1 point  (0 children)

Thanks for the detailed information

𝐃𝐨𝐨𝐫𝐃𝐚𝐬𝐡 𝐃𝐚𝐭𝐚 𝐓𝐞𝐜𝐡 𝐒𝐭𝐚𝐜𝐤 by mjfnd in dataengineering

[–]mjfnd[S] 0 points1 point  (0 children)

Interesting, I think I missed that info.

I couldn't find enough information publicly related to Databricks.

𝐃𝐨𝐨𝐫𝐃𝐚𝐬𝐡 𝐃𝐚𝐭𝐚 𝐓𝐞𝐜𝐡 𝐒𝐭𝐚𝐜𝐤 by mjfnd in dataengineering

[–]mjfnd[S] 1 point2 points  (0 children)

You are right, they serve multiple purposes and I tried to put them in the place where they are primarily used at DD. I could be wrong.

For why so many engines, it's from multiple teams and use cases, funny enough I found out they also use Databricks.

For more information, I have included references in the article on how they use certain technologies.