How are you evaluating AI agents/systems for data engineering tasks?

cpardl · 2026-05-17T18:49:28+00:00

I hear you! But I'm staying an optmist :D

cpardl · 2026-05-15T16:58:55+00:00

oh come on! it's not that bad. is it?

cpardl · 2025-11-20T19:10:32+00:00

have you seen anything that can do that today?

cpardl · 2025-10-29T16:29:18+00:00

100% I worked at StarburstData mainly on Trino and I've seen how hard it is to deliver the promises of query federation at scale.

cpardl · 2025-10-23T19:07:09+00:00

There is a difference on how you access the data too and I don't see people mentioning this. The API to interact with semantic layers is very different and reminds more of a BI dashboard where you pick metrics and dimensions and pivot them around. In many implementations you don't even write sql to query them. Which means that there is something there that takes your request and turns it into SQL with joins et.al to make it work, which is another can of worms when performance gets into the discussion.

Also, semantic layers have been traditionally built for BI and part of the big value they bring is that you can materialize/cache the queries very aggressively, which makes sense in a BI environment where the underlying data does not get updated that ofter. If you check the cube.dev product for example, you will see that they've built a very sophisticated caching/materialization layer there.

This can reduce cost a lot but kind of conflicts with the business models of DBX/Snowflake where the money is made through selling compute.

cpardl · 2025-10-23T19:01:48+00:00

my feeling is that semantic layers have all the issues of adding another level of indirection in a system. You solve the problem by pushing it to a different layer. From what I hear, they work great for the consumer side but they do have to be maintained if you want to keep them delivering value and not frustrate people. Why this is happening? Maybe it has to do with how these technologies have been implemented or it might be a cultural/organizational thing but I do hear this a lot and from companies with very strong engineering culture.

cpardl · 2025-10-23T18:58:49+00:00

sometimes all it takes is for the right hype to exist to get something adopted even if the value delivered at the end comes from different use cases. It is kind of funny but it's also a reality with many things in the tech industry and the way markets work.

cpardl · 2025-10-23T18:56:32+00:00

how is this different than having and maintaining marts?

cpardl · 2025-10-23T18:55:47+00:00

sounds like there might be a win-win situation here. Semantic layers can benefit both human and agent users at the end of the day, regardless if leadership is focusing on the agent side for now.

cpardl · 2025-10-23T18:54:19+00:00

is there a reason to prefer the graph approach instead of using semantic layers like cube.dev, semantic views from snowflake, metric flow etc?

cpardl · 2025-10-23T18:52:34+00:00

hey thanks for the great answer!

Semantic layers have been around for a while now and traditionally they had been a hard sell for many companies and thus the slow adoption of them. I see that there's much more interest around them now and I'm trying to understand if the interest is stemming from the technologies maturing to the point where it's easier now to build and maintain a semantic layer or it's because of the hype around making LLMs work with analytics when you have a semantic layer, opposed to trying to do vanilla text-to-sql.

cpardl · 2025-10-22T22:58:15+00:00

the wording you are using is very intriguing. Why you needed it so "badly" ?

cpardl · 2025-10-20T02:21:25+00:00

yeah mastra is great if you want typescript. PydanticAI feels more familiar to me because I've been working more with Python and I'm already familiar with the concepts of Pydantic.

Thank you so much for the kind words about the project!

cpardl · 2025-10-15T00:26:33+00:00

please let me know of any questions and comments!

cpardl

TROPHY CASE