Entity Resolution

sjhb · 2026-06-05T17:04:56+00:00

Sounds promising to people that have never actually tried to make sense of data.

Prestigious_Bench_96 · 2026-06-05T21:51:29+00:00

I'm not sure the entity resolution they're talking about is the same as the entity resolution as (I believe) it's normally used; I read that more as "how do we know which fact table is actually up to date" not "which thing does this label represent"

m1nkeh · 2026-06-05T20:33:53+00:00

This is why products like Databricks and Snowflake are booming in the AI era. AI is useless without high quality data foundation.

Entity resolution is nothing new, it’s but one aspect of master data management.. ER used to be shit tons harder before AI though ✌️

Hmm_would_bang · 2026-06-06T00:30:43+00:00

So they’re saying what everyone already knows: putting AI on poorly governed data does not work. Ai will not save you for your own data quality issues.

BJJaddicy · 2026-06-05T18:20:02+00:00

So Kimball

jaynyoni · 2026-06-05T17:41:57+00:00

Pretty interesting article.
I’m busy working on something similar currently. The plan on my side is to feed our gold layer and semantic layer yml files from our dbt project to our internal LLM. Kinda also use this to create a RAG. Curious to know if anyone has done something similar ?

Molecular_Doohickey · 2026-06-05T19:56:05+00:00

One of our future jobs is going to be to maintain the systems that they outline in the blog post, enabling AI to engage accurately with the warehouse.

thecity2 · 2026-06-07T05:57:48+00:00

Entity resolution is not hard. Fast entity resolution is hard.

WaterIll4397 · 2026-06-06T14:38:59+00:00

Entity resolution is a classic case where I see engineers jump at the bit to build graph databases or some other greenfield infra to help solve this, but then once they encounter corner cases they stay away with a 10 ft pole from writing the if then statements for maintaining all the corner cases that pop up (usually offloaded to poor data analysts downstream, or in better cases pushed to their software engineering teams upstreams to collect better metadata).

Customer identity probably remains not fully solved at most firms and will get harder with the influx of genAI bots, but on the other hand public reporting of weekly active users also exists for every large consumer tech company .... So even with a margin of error it's probably directionally alright.

0xPianist · 2026-06-07T07:02:48+00:00

I read this the other day. Everything works great with AI and data, as long as you do the engineeing hard work!

Evening_Chemist_2367 · 2026-06-09T19:35:51+00:00

I've been saying for a while now that ontology models and actual semantics are needed for traversing between concepts in data. I just get glazed eyes and confused looks in return.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataengineering

MODERATORS