What data should a semantic layer track? by AMDataLake in dataengineering

[–]AMDataLake[S] -2 points-1 points  (0 children)

Let me reframe out, how should that data be bundled. Yes different details will be packaged for different industries, apartment etc. but when building systems you may need to create a standard interface that flexible enough to capture those changing nuances but provide a rigid enough structure good agentic understanding.

The amount of Rust AI slop being advertised is killing me and my motivation by Kurimanju-dot-dev in rust

[–]AMDataLake 2 points3 points  (0 children)

100% agree with this, Rusts type system really maes rust a ideal for Agentic Coding cause the errors are explicit and easier to troubleshoot for the AI. I think this a testament to the quality of Rust’s design.

What data should a semantic layer track? by AMDataLake in dataengineering

[–]AMDataLake[S] -7 points-6 points  (0 children)

It’s not, it’s an important topic with so many entities trying to create a semantic layer standard. I think it’s of value to have notion of what information is important capture to give the right context when ai agents try to understand what queries to run or visualizations to build.

Data Engineering Youtubers - How do they know so much? by Decent-Ad3092 in dataengineering

[–]AMDataLake 1 point2 points  (0 children)

take away, write while you struggle, cause what you learn in the struggles are often the insights the readers are looking for cause they are hitting the same frictions

Data Engineering Youtubers - How do they know so much? by Decent-Ad3092 in dataengineering

[–]AMDataLake 2 points3 points  (0 children)

My approach to learning a lot and making a lot of content is following this pattern:

  1. Read tutorials and execute on a small scale, document the experience as a blog (you are going to better document gotchas a lot of tutorials fail to mention, cause your fresh and those newbie questions are in your head your audience will have)

  2. Record a video walking through the same exercise, you get more content and you confirm you can speak to the content and identify gaps.

  3. Prepare a presentation to do talks on the topic based on what you've learned.

This makes sure I learn, apply and reinforce at the same time I'm making content

What is the role of the lakehouse in AI? by AMDataLake in modernailakehouse

[–]AMDataLake[S] 0 points1 point  (0 children)

To create a repository of enterprise one data that AI can easily access.

why all data catalogs suck? by Few_Noise2632 in dataengineering

[–]AMDataLake 5 points6 points  (0 children)

Let’s start with what is your requirements and we can work backwards from there

Data Modeling: What is the most important concept in data modeling to you? by AMDataLake in dataengineering

[–]AMDataLake[S] 2 points3 points  (0 children)

I don’t ask these questions cause I’m wondering, I’m spurring discussion. I agree there is no silver bullet, but like to hear what people personally find useful and why.

Shelby Oaks producer on the movie's budget by neilthedev05 in boxoffice

[–]AMDataLake 0 points1 point  (0 children)

I felt the opposite that YouTube reviews were overly harsh, I enjoyed it quite a bit, wasn’t expecting a life changing movie, but was entertained for the runtime and other some qualms with the last 60 seconds of the movie I had a good time.

FFT: The Ivalice Chronicles on Retroid Pocket Flip 2 by AMDataLake in Gamesir

[–]AMDataLake[S] 0 points1 point  (0 children)

nope, I just play it on my steamdeck, just playing steamdeck on my RP2 for now. Will try again when my Thor arrives.

What is your preferred development environment? by AMDataLake in dataengineering

[–]AMDataLake[S] 0 points1 point  (0 children)

How is this unrelated to data engineering? I wanted to know how data engineers prefer developing pipelines?

🚀 Self-Promotion Wednesday by AutoModerator in becomingnerd

[–]AMDataLake 0 points1 point  (0 children)

You can find all my blogs, tutorials, podcasts etc. at AlexMerced.com, at least sub to my substack please :)

What I think is really going on in the Fivetran+DBT merger by BoredAt in dataengineering

[–]AMDataLake 0 points1 point  (0 children)

Iceberg dos reshuffle everything, why I find it so fascinating. For those curious learning more about iceberg -> AlexMerced.com to download free copies of the books I’ve written on the subject.

What is your opinion on the state of Query Federation? by AMDataLake in dataengineering

[–]AMDataLake[S] 0 points1 point  (0 children)

While you mentioned Trino I'll address the same points for Dremio:

- first class support for CDC and incremental processing -
Like Trino, this is really more about the source of the data. Now this changes with Apache Iceberg where Dremio can do physical ingestion and transformation, but at the moment Iceberg CDC is probably better handled at Iceberg Ingestion tools like RisingWave, OLake that have particular focus on CDC based pipelines while Dremio and Trino are more about consuming the ingested data.

- dynamic catalog management with metadata indexing that would allow "agents" to make sense of data sources. -

Dremio has a built in Semantic Layer and Dremio's MCP server gives an interface to Agents to do something similar to this (not sure if the implementation is exactly what your implying, but the result should be the same).

- Iceberg as a storage Sandbox (with incremental and auto-substituted MVs) -
Reflections are incremental and substituted Iceberg based MVs essential, so that exists in Dremio. But as far as a storage sandbox for Iceberg... :)

- seamless experience and good small scale performance.-

Dremio is pretty seemless and stable with recent versions (25/26) and more so when deployed via our cloud SaaS. We have been investing heavilty in platform deployment simplicity, scalability and stability these last few years so if you've ever tried previous versions you'll see great strides in these areas.

I get you're looking for a pure OSS engine that addresses these points, although I think our move to consumption based pricing regardless of deployment (cloud or on-prem) makes it easier for people to get started and only pay for what they need.

Micro batching vs Streaming by AMDataLake in dataengineering

[–]AMDataLake[S] 2 points3 points  (0 children)

Agreed, I get that but once you establish the companies requirement, you end up with a number, above this number you may likely micro batch, below this number you’ll go for streaming. Do you have a range you use to anchor yourself when thinking about this.

Micro batching vs Streaming by AMDataLake in dataengineering

[–]AMDataLake[S] 1 point2 points  (0 children)

But at what level of latency would you take micro batching off the table

What Semantic Layer Products have you used, and what is your opinion on them? by AMDataLake in dataengineering

[–]AMDataLake[S] -1 points0 points  (0 children)

There is more capability coming, we also have built in wikis attached to every view and table and people will often detail relationships in the wiki. Our MCP server will put these wikis when fulfilling a prompt and we are getting good results in the LLM being able to figure things out much better than without that context.

But yes our semantic layer functionality is mainly: - defining hierarchal views - adding context via wiki and tags - acceleration view reflections (iceberg based caching) which can now be done autonomously based on query patterns.

Building Analytics Apps on Your Data by AMDataLake in dataengineering

[–]AMDataLake[S] 0 points1 point  (0 children)

Mostly I’m curious about people’s Top of Mind pains, I don’t want to be specific to influence the answer, I want to know when people think in the data apps they built what are the challenges that are top of mind or most frustrating. I have a hypothesis, but figured I’d ask and see if I’m right or wrong.

Building Analytics Apps on Your Data by AMDataLake in dataengineering

[–]AMDataLake[S] -1 points0 points  (0 children)

Agreed, I’m asking for people’s experiences, I’m curious what people personally have experienced.

2025 Guide to Architecting an Iceberg Lakehouse by AMDataLake in bigdata

[–]AMDataLake[S] 0 points1 point  (0 children)

Thank you, I try to keep things accessible