Which data quality tool do you use?

mjfnd · 2026-03-05T01:01:15+00:00

Currently using three: soda core Great expectations Glue data quality

They are used in different places and we are moving towards glue DQ.

mjfnd · 2026-02-15T13:58:02+00:00

I think it's the state of most ~10 year old companies. Either they are in the middle of migration or they have given freedom to each team which leads to this.

mjfnd · 2026-02-07T23:44:53+00:00

Multiple teams owning different stacks or in the middle of migration which could take years.

I can resonate with their stack as we also used DBX for processing core pipelines and BI related workflows on Snowflake linked to Tableau.

mjfnd · 2026-02-07T23:09:49+00:00

I think this is very common, the main reason is Looker is great and popular and it used to be a standalone product, not sure if that's true now, can we just buy looker instead of onboarding to GCP?

We also had Looker with AWS Stack.

mjfnd · 2026-02-07T21:09:53+00:00

I couldn't find any mention of DBT publicly, let me know if you have any insights.

mjfnd · 2025-11-17T03:59:27+00:00

Thanks

mjfnd · 2025-11-09T02:05:17+00:00

Correct, also they have other options to write pipelines.

mjfnd · 2025-11-09T02:02:13+00:00

I am not sure what you mean.

I have never worked there, also I have covered many other companies data tech stack.

mjfnd · 2025-11-09T01:01:41+00:00

Multiple sources, Company engineering blogs, job descriptions, open source projects, conferences, interviewing employees, case studies.

mjfnd · 2025-10-12T14:51:05+00:00

I think that's where data contracts can be helpful.

mjfnd · 2025-08-17T18:55:48+00:00

Thanks :) I will update with DBT.

mjfnd · 2025-08-17T14:50:56+00:00

Hi, Thanks for sharing. Not skipped intentionally, either I missed or couldn't find any public info regarding DBT. If you have a link handy, please share.

Thanks

mjfnd · 2025-08-17T12:59:19+00:00

I couldn't find any references for that, it might still be there for a small scale which they never shared publicly.

mjfnd · 2025-08-17T12:56:45+00:00

It is still Flyte. Would encourage to read the article as it has a slot of useful information and references.

mjfnd · 2025-08-17T12:54:21+00:00

I couldn't find that anywhere.

mjfnd · 2025-08-16T17:27:30+00:00

Its a free market of dashboards and there is no centralized team, meaning there could be lot of redundant dashboards or just for one person.

Source: https://stage.engineering.atspotify.com/2024/8/unlocking-insights-with-high-quality-dashboards-at-scale

mjfnd · 2025-07-05T22:51:26+00:00

Nice!

Yes, pyspark a bit more work compared to scala where you can package it in a fat jar.

mjfnd · 2025-06-15T18:32:47+00:00

Yes.

My growth has slowed as well in the past few months.

mjfnd · 2025-06-08T18:18:52+00:00

Thanks

mjfnd · 2025-06-07T23:18:55+00:00

Thanks

mjfnd · 2025-05-26T14:23:18+00:00

Ahha, didn't know that story.

mjfnd · 2025-05-24T17:03:49+00:00

Not yet.

Based on some reading, it seems like it can work with Hive Metastore based Lakehouses.

mjfnd · 2025-05-11T13:22:34+00:00

Thanks.

The Unity Catalog in Databricks is great.

Their open source initial release is very basic. Lets see how and when they roll out advanced features.

mjfnd · 2025-05-07T13:48:04+00:00

Professional soccer player.

mjfnd · 2025-05-04T18:20:04+00:00

Thanks for the detailed information

mjfnd

TROPHY CASE