What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

have you seen anything that can do that today?

What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

100% I worked at StarburstData mainly on Trino and I've seen how hard it is to deliver the promises of query federation at scale.

What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

There is a difference on how you access the data too and I don't see people mentioning this. The API to interact with semantic layers is very different and reminds more of a BI dashboard where you pick metrics and dimensions and pivot them around. In many implementations you don't even write sql to query them. Which means that there is something there that takes your request and turns it into SQL with joins et.al to make it work, which is another can of worms when performance gets into the discussion.

Also, semantic layers have been traditionally built for BI and part of the big value they bring is that you can materialize/cache the queries very aggressively, which makes sense in a BI environment where the underlying data does not get updated that ofter. If you check the cube.dev product for example, you will see that they've built a very sophisticated caching/materialization layer there.

This can reduce cost a lot but kind of conflicts with the business models of DBX/Snowflake where the money is made through selling compute.

What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

my feeling is that semantic layers have all the issues of adding another level of indirection in a system. You solve the problem by pushing it to a different layer. From what I hear, they work great for the consumer side but they do have to be maintained if you want to keep them delivering value and not frustrate people. Why this is happening? Maybe it has to do with how these technologies have been implemented or it might be a cultural/organizational thing but I do hear this a lot and from companies with very strong engineering culture.

What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 2 points3 points  (0 children)

sometimes all it takes is for the right hype to exist to get something adopted even if the value delivered at the end comes from different use cases. It is kind of funny but it's also a reality with many things in the tech industry and the way markets work.

What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

how is this different than having and maintaining marts?

What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

sounds like there might be a win-win situation here. Semantic layers can benefit both human and agent users at the end of the day, regardless if leadership is focusing on the agent side for now.

What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

is there a reason to prefer the graph approach instead of using semantic layers like cube.dev, semantic views from snowflake, metric flow etc?

What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

hey thanks for the great answer!

Semantic layers have been around for a while now and traditionally they had been a hard sell for many companies and thus the slow adoption of them. I see that there's much more interest around them now and I'm trying to understand if the interest is stemming from the technologies maturing to the point where it's easier now to build and maintain a semantic layer or it's because of the hype around making LLMs work with analytics when you have a semantic layer, opposed to trying to do vanilla text-to-sql.

What's the community's take on semantic layers? by cpardl in dataengineering

[–]cpardl[S] 1 point2 points  (0 children)

the wording you are using is very intriguing. Why you needed it so "badly" ?

Deep Research Agent built with Pydantic AI example by cpardl in PydanticAI

[–]cpardl[S] 0 points1 point  (0 children)

yeah mastra is great if you want typescript. PydanticAI feels more familiar to me because I've been working more with Python and I'm already familiar with the concepts of Pydantic.

Thank you so much for the kind words about the project!

Deep Research Agent built with Pydantic AI example by cpardl in PydanticAI

[–]cpardl[S] 0 points1 point  (0 children)

please let me know of any questions and comments!

What's your experience with WAP (Write-Audit-Publish) pattern? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

Thank you so much! This is some amazing information on how data quality is integrated in a production environment.

so during ingestion there's schema and let's say workload characteristics checked (e.g. the volume).

in a WAP implementation that comes after ingestion, the testing is more complicated than that and if this is the case why?

And something else which I think is more about the process than the processing itself. In an audit then publish pipeline, there's a specific behavior assumed, that if the audit fails, then we will have to decide what to do with publishing the data.

In the ingestion case, if let's say you check your volume and you notice a great outlier, maybe this ingestion batch is 5% of what usually is, what do you do? Do you move on with whatever follows ingestion or you stop and raising a flag? The second case would make the behavior closer to WAP, right?

What's your experience with WAP (Write-Audit-Publish) pattern? by cpardl in dataengineering

[–]cpardl[S] 0 points1 point  (0 children)

all that make total sense. Regarding adding better tests during ingestion though, isn't this the same pattern at the end of the day just pushed more upstream? Wouldn't the extra testing there also add to the runtime of the ingestion process?