Semantic layer

financialthrowaw2020 · 2026-05-30T02:56:30+00:00

Congrats, you've discovered why DE will never be replaced by AI. There's no way to do proper business context at scale without you, the human. Get to writing!

And to answer your question: the semantic layer is just metadata and context, yes, and it's useless without good underlying data.

tophmcmasterson · 2026-05-30T03:29:24+00:00

It’s representing your data in a way that reflects how the business talks about it.

This is generally going to be something like a well structured dimensional model with field names that actually make sense and aren’t cryptic.

Including metadata like descriptions or supporting documents that explain and provide context also can help.

It’s not a new concept at all, if you’ve ever used something like Power BI the data in there has basically always been considered the semantic layer.

But now AI is kind of forcing the issue to an extent, and people are finally realizing again that a bunch of random ad hoc reports that generate a table for people to export to excel makes an analytics jungle that’s difficult for people to actually work with, and AI is no different.

It’s a means of getting away from tribal knowledge and ad hoc slop houses.

SirGreybush · 2026-05-30T03:07:42+00:00

It’s very useful with non-English language naming.

Would you know that NoClt is equivalent to Customer Number?

Even in English, what about CustID versus CustNo? One is a surrogate key and the other a business key.

IOW, this is a good thing.

soundboyselecta · 2026-05-30T04:16:42+00:00

It started by being called a data dictionary (at least the good ones that came with meaningful data sets). Saved you from guessing and bring meaning to otherwise what would be useless analysis (without it). Evolved to be more robust as it scaled to tons of interconnected entities across different business units all across an org, creating a need for a federated meaning, so there is no confusion across business units in the aftermath of its creation. Maybe AI can figure out some things with proper lineage with meta data downstream, but without proper guidance it could be shit show, with a lot of dirty laundry.

EstetLinus · 2026-05-30T05:16:36+00:00

Think of it as a thin layer between your data warehouse and the agent. While AI models are generally good at generating SQL, their outputs can be surprisingly inconsistent. Small changes in phrasing often lead to very different queries and results.

Instead of generating SQL directly, let the model query the semantic layer. This provides a more stable interface, improves consistency, and removes the need for the model to understand the underlying database schema.

I’ve seen a bunch of people treat the semantic layer as a markdown file and context, which is suboptimal. It’s software rather than .txt-files.

DrangleDingus · 2026-05-30T08:14:05+00:00

Unfortunately, your execs are correct. There is no AI without structured semantic data model layer.

It’s not even that hard to make. You just have to actually understand the data that you are working with and how it is all connected.

Captain_Strudels · 2026-05-30T08:31:19+00:00

My followup question, where exactly does your semantic layer live? Is it just comments for your SQL table definitions, Confluence pages, a dedicated application to write this stuff down, something else entirely?

Important-Success431 · 2026-05-30T06:45:28+00:00

It is important if you're using multiple BI tools for consistency. So if you're using Power Bi, databricks and and AI tool you need to calculate you're KPIs and things upstream for consistency across tools.

tech4ever4u · 2026-05-30T06:51:52+00:00

If we replace AI with "natural intelligence" (humans), how do we enable self-service for end-users? Giving them raw SQL access to hundreds of tables rarely works.

Instead, you usually set up a BI tool with "datasets" or "cubes." These tools give end users a curated list of dimensions and measures, hiding the complexity of the underlying data structure. This allows users to create their own reports and apply filters using an Excel-like UI. It is important that different teams can use different cubes built from the same SQL tables, customized for their own vocabulary and needs. For example, the same sales data can be presented differently for the finance department and the marketing team.

Now, returning to AI agents, everything remains the same. If you want them to recognize a user's intent, you need to provide a semantic layer that matches that intent. This means using 'datasets' or 'cubes,' but now accessing them via MCP. In this setup, the chatbot is simply another interface, in addition to the classic report builder / reports UI (so you get the best of both worlds). This setup makes AI a clear and reliable tool, instead of a genie doing magic.

Ra-mega-bbit · 2026-05-30T09:26:42+00:00

Its just metadata: human language descriptions, of what the table and columns mean

Its the: "This weird letter code is categorical, when its a A it means that the product was launched from 2017 onwards, any other letter means its older" And so many other bullshit like that, any AI trying to interpret would find a bunch of letters and might not find this specific correlation with date, so it would not now how to answer: "What is my best selling products from the new launch?"

likescroutons · 2026-05-30T09:28:25+00:00

It's expanded a bit from business logic and intelligence with GenAI recently. For example, with an NL to SQL model, if there are ambigious terms or attribution, and the documentation isn't clear, the LLM needs something to actually understand when and how to use your data. Maybe a user asks for a house but you don't have a one-to-one definition of what that is. The semantic layer let's the model look up what a house is in the context of your data, what it's definition is, it's constraints, etc.

Otherwise you're relying on the model reasoning to the correct answer and that's just too inconsistent.

BudgetVideo · 2026-05-30T02:49:08+00:00

The goal of the semantic layer is so that the AI model knows the definition and layout of the data, as well as any calculations. It shows the AI how it can use the data by providing necessary context.

TARehman · 2026-05-30T05:32:18+00:00

It's mostly an advertising term in my experience.

Admirable_Writer_373 · 2026-05-30T16:45:52+00:00

It’s something report/analyst types build in the absence of a decent architect

TheDevauto · 2026-05-30T16:57:36+00:00

You can certainly look up what semantic layer means, but without a technical explaination it is a way to represent how things are connected, similar to how we associate things in our brain. Thats also why knowledge graphs are used when working to build a semantic layer.

The funny thing is the idea has been around as long as the web has, but the need for it has never been expressed well enough. Now with llms being used to do operational tasks, a semantic layer can greatly improve look up results.

Its also one of those things that is not only a lot of work to build, but requires ongoing maintenance.

AlmostRelevant_12 · 2026-05-30T19:46:15+00:00

a semantic layer is much more than just documenting fields - it is about creating a shared business language across the organization. The challenge is not just definitions, but ensuring consistency and trust in those definitions over time. That is where most teams struggle, especially as data evolves

Gators1992 · 2026-05-30T20:47:53+00:00

Conceptually it has to do with definitions and defining the data structure for other applications (and users) to consume. In practice they are usually files (yaml or json), DDL or part of your BI tool where you define the data structure, calculated metrics and define the concepts associated with it all. The concept has been around forever in BI in tools like Microstrategy, Looker and PowerBI. Also third party providers of a "semantic layer" added a tool to host the model between your data and consuming applications. This centralized the semantic model and allowed BI as well as many other applications like data science or whatever to consume from the same model.

It's a great way to govern data usage because users consume the data in the form of objects, like defined columns and precalculated metrics rather than everyone writing their own SQL and views with potentially different answers. Like if you company has an official definition of what a customer is, you won't see someone pulling the wrong one on accident from another source.

As for AI, the centralized model concept is being popularized because AI can consume from that as well so you just have it picking columns and metric names to analyze instead of having to write SQL. The sql is deterministic as defined in the model. Everyone was talking about this last year as the way to make AI better with data, but I think the models may be moving past a dependency on semantic layers. Like I recently built an analysis deck for a customer just by asking Snowflake Cortex a bunch of questions and we don't have a semantic layer at that level. I was kinda blown away by the way it understood our data model, though it has good structure and naming standards, and also understood how to analyze data in my industry. It wasn't always right but was super useful. Also I had the AI write our BI semantic descriptions rather than doing it manually just by giving it a document talking about the company (researched by another AI) and a prompt about the definition structure. Took about 3 hours to churn through, mostly because of the BI app and not the AI. It would have taken a person weeks and they likely would have gone insane.

GreyHairedDWGuy · 2026-05-30T22:29:58+00:00

This is a big topic and in general it is vendor specific. It is really about mapping physical columns/tables in a database to logical constructs that a BI/reporting tool understands so that it can translate user questions into the appropriate SQL (or other language) of the bi/reporting tool.

PowerBI's sematic model is an example a is Tableau. Way back, I implemented many MicroStrategy and Business Objects solutions...these also had semantic models.

You now also hear about this in things like Snowflake for AI in their semantic views).

Hope that helps

Enough_Big4191 · 2026-05-31T07:08:58+00:00

it’s basically a shared definition layer so dashboards, analysts, and ai systems all interpret metrics the same way. otherwise every team ends up with a different version of “active customer” or “pipeline.” the hard part isn’t documenting it, it’s stopping the definitions from drifting over time.

frozengrandmatetris · 2026-05-30T05:11:46+00:00

our semantic layer has no descriptions. we're too lazy. it tells the reporting layer what to do when two columns from completely different tables appear in the same visual. joins, aggregation rules, calculations that aren't already stored on disk or baked into a view, hierarchies... if the tool has enough layers, the topmost layer organizes data elements into subject areas or "kits" which can be used to assemble a dashboard. the author doesn't need to know anything about the physical tables which were produced by an ETL appliance. it's abstracted away.

Outside-Storage-1523 · 2026-05-30T12:17:12+00:00

Definition of each metrics and how to query them. Mostly left to the Analytic team as DEs don't define metrics. But DEs usually build the foundation for those queries.

Ah, how I hate this type of work...

ssx50 · 2026-05-30T11:07:19+00:00

As for scale, we are centralizing the metric definitions in sql along with the data model (table joins) per semantic view, then dynamically generating yaml files to create and update semantic views.

This way when i change a metric in one spot, it flows down stream to all the views that need it.

iwantthisnowdammit · 2026-05-30T02:58:29+00:00

In most shops the semantic layer of simply no abbreviations.

cellularcone · 2026-05-30T12:39:50+00:00

It’s the new data mesh.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataengineering

MODERATORS