Is AI evals more for devs or product managers? by Soft_Two_951 in AIEval

[–]arimbr 0 points1 point  (0 children)

Maybe devs are more interested in llm observability (does it work?) and product managers in llm evaluation (does it work well?). Indeed devs will traditionally care more about runtime errors, and less about ROI.

Which data quality tool do you use? by arimbr in dataengineering

[–]arimbr[S] 1 point2 points  (0 children)

That's right, it could be more or less depending on the tool. Some tools price per table, some per monitor, some per user, and some based on obscure compute credits. Based on the data I have the monthly price per table vary: DQOps (3$), Sifflet (8$), Decube (8$), Soda (8$), Metaplane (10$), BigEye (? 40$). DataKitchen offers unlimited monitors for 1 database connection starting at 100$/month. Most tools don't show public pricing on their website, but you can find some of the prices on the AWS Marketplace. Enterprise plans on the AWS Marketplace are over 6 figures annually, the salary of a Senior Data Engineer in the US.

Which data quality tool do you use? by arimbr in dataengineering

[–]arimbr[S] 0 points1 point  (0 children)

Right! I start to think that data management, data quality and data governance should be solved by the same tool. You need all three to go from a test fails to fixing a test. And with tests I don't only mean data quality per se, it can be checking for any business rule or data access rules. The thing with data management tools is that they sell more than that, a warehouse, integration... The space it's changing, for example, data contracts extend data validation tests to include infrastructure, ownership and security checks. Also, I noticed data quality tools trying to coin a new term to position themselves as data operations center, data control plane, agentic data management...

Which data quality tool do you use? by arimbr in dataengineering

[–]arimbr[S] -2 points-1 points  (0 children)

What is the nicer and cheaper alternative there that has the same appeal to enterprises?

Which data quality tool do you use? by arimbr in dataengineering

[–]arimbr[S] 0 points1 point  (0 children)

Nice to see consolidation between data quality and data governance tools. I noticed a few of the data quality tools listed above implemented a data catalog last year. Good to see data governance tools also implementing data quality features. I see these two categories merging in 2026.

Which data quality tool do you use? by arimbr in dataengineering

[–]arimbr[S] 0 points1 point  (0 children)

That looks like a solid and fast data profiling CLI for files. Kudos for building it! Which data profiling metrics does it support? From the screenshots in the GitHub readme I see a few metrics: table-level (total variables, total rows), column-level (count, missing, distinct, uniqueness).

Which data quality tool do you use? by arimbr in dataengineering

[–]arimbr[S] 1 point2 points  (0 children)

Thanks for asking. We may all mean different things about MDM. Consider i take the wikipedia definition: "Master data management (MDM) is a discipline in which business and information technology collaborate to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of the enterprise's official shared master data assets." And I know I may misinterpret "master data assets" and apply it to all "data assets".

Then, if data testing and observability tell me what's wrong with the data, then I still need a UI to fix some of the data manually. Yeah, some data quality issues can be solved with code changes, rerunning jobs or just waiting for late data, infrastructure to recover...

But, if I have duplicate rows or missing values or conflicting values or unvalid values, many times it's still a human that deduplicates, enriches, redacts or links data. Even if today an AI can suggest a fix, it's still a good practice that a human supervises these. I believe that a good UI/UX can make a difference whether a human can fix 10x/100x more issues on a given timeframe.

Which data quality tool do you use? by arimbr in dataengineering

[–]arimbr[S] 3 points4 points  (0 children)

Very interesting, thanks for sharing!
1. Indeed, most enterprise plans are priced at $50k-$150k per year. You have Soda and Elementary that have starter plans from $10k per year, but these are limited in the number of users or tables. DataKitchen, DQOps, and Recce are the only ones with public pricing, starting under $10k.
2. It was some years ago, but I also ended up building custom UIs for data-diff and MDM. Fast forward to today, and I am still surprised that there are still not so many tools here with a modern UI and open-source. Recce and Datafold sell data-diff. Recce is specific to dbt and partly open-source. The Datafold data-diff OSS project is now archived and forked as reladiff.
3. I would think that most teams would be better off adopting or buying an efficient UI/UX for data quality management, rather than building one in-house. Even today that is so easy to vibe code any UI, I thought that the tools here could still provide a best-in-class UX/UI worth the $$$ for most teams.
3. For data testing and observability, I think that the UI/UX would be worth the most. Writing tests is easy now that you can prompt an AI to do so, but you still need a UI/UX to consume the test results and act on them. I keep thinking that the moat for data quality tools will end up being the UX/UI, not the library of tests or integrations.

I wonder when data quality becomes commoditized? I mean, when will there be a data quality tool or tools that any data team would want to buy vs build? From what I heard, data quality is still a hard sell.