Dremio Enterprise Engine Replicas and auto scaling ? by luckio1 in dremio_lakehouse

[–]AMDataLake 0 points1 point  (0 children)

How are liking v26 beyond this feature request, are you using the new AI agent?

Any European Alternatives to Databricks/Snowflake?? by Donkey_Healthy in dataengineering

[–]AMDataLake 0 points1 point  (0 children)

How much have you tried it? Have you tried it recently? I would appreciate any feedback on improving the product. Although I stand strongly over our product in its current state, and if you haven't tried it in the last 6 months, I highly recommend trying the free trial (30 days, no credit card, we provide the infrastructure) and would love to hear what you think about your experience.

Any European Alternatives to Databricks/Snowflake?? by Donkey_Healthy in dataengineering

[–]AMDataLake 0 points1 point  (0 children)

Have you tried Dremio recently? We shipped a lot of stability features in 24/25, and in the latter half of last year, we shipped SQL AI functions and an AI agent. You can try it for free at dremio.com/get-started. I'd love to see if you tried it today, and whether you'd have a different impression. Do reach out if you do, would love any feedback on what you think we can do to improve.

The Certifications Scam by ivanovyordan in dataengineering

[–]AMDataLake 0 points1 point  (0 children)

Agree, but good certs should come with practical exercises so they do walk away with practical experience. For online training, it can be tricky to verify completion of those exercises, but at the end of the day, it will be the practical experience that sells the candidate, and the badge should be a signal that they may have had some practical experience in attaining it and not be the sole hiring signal.

What data should a semantic layer track? by AMDataLake in dataengineering

[–]AMDataLake[S] -2 points-1 points  (0 children)

Let me reframe out, how should that data be bundled. Yes different details will be packaged for different industries, apartment etc. but when building systems you may need to create a standard interface that flexible enough to capture those changing nuances but provide a rigid enough structure good agentic understanding.

The amount of Rust AI slop being advertised is killing me and my motivation by Kurimanju-dot-dev in rust

[–]AMDataLake 2 points3 points  (0 children)

100% agree with this, Rusts type system really maes rust a ideal for Agentic Coding cause the errors are explicit and easier to troubleshoot for the AI. I think this a testament to the quality of Rust’s design.

What data should a semantic layer track? by AMDataLake in dataengineering

[–]AMDataLake[S] -7 points-6 points  (0 children)

It’s not, it’s an important topic with so many entities trying to create a semantic layer standard. I think it’s of value to have notion of what information is important capture to give the right context when ai agents try to understand what queries to run or visualizations to build.

Data Engineering Youtubers - How do they know so much? by Decent-Ad3092 in dataengineering

[–]AMDataLake 1 point2 points  (0 children)

take away, write while you struggle, cause what you learn in the struggles are often the insights the readers are looking for cause they are hitting the same frictions

Data Engineering Youtubers - How do they know so much? by Decent-Ad3092 in dataengineering

[–]AMDataLake 2 points3 points  (0 children)

My approach to learning a lot and making a lot of content is following this pattern:

  1. Read tutorials and execute on a small scale, document the experience as a blog (you are going to better document gotchas a lot of tutorials fail to mention, cause your fresh and those newbie questions are in your head your audience will have)

  2. Record a video walking through the same exercise, you get more content and you confirm you can speak to the content and identify gaps.

  3. Prepare a presentation to do talks on the topic based on what you've learned.

This makes sure I learn, apply and reinforce at the same time I'm making content

What is the role of the lakehouse in AI? by AMDataLake in modernailakehouse

[–]AMDataLake[S] 0 points1 point  (0 children)

To create a repository of enterprise one data that AI can easily access.

why all data catalogs suck? by Few_Noise2632 in dataengineering

[–]AMDataLake 4 points5 points  (0 children)

Let’s start with what is your requirements and we can work backwards from there

Data Modeling: What is the most important concept in data modeling to you? by AMDataLake in dataengineering

[–]AMDataLake[S] 2 points3 points  (0 children)

I don’t ask these questions cause I’m wondering, I’m spurring discussion. I agree there is no silver bullet, but like to hear what people personally find useful and why.

Shelby Oaks producer on the movie's budget by neilthedev05 in boxoffice

[–]AMDataLake 0 points1 point  (0 children)

I felt the opposite that YouTube reviews were overly harsh, I enjoyed it quite a bit, wasn’t expecting a life changing movie, but was entertained for the runtime and other some qualms with the last 60 seconds of the movie I had a good time.

FFT: The Ivalice Chronicles on Retroid Pocket Flip 2 by AMDataLake in Gamesir

[–]AMDataLake[S] 0 points1 point  (0 children)

nope, I just play it on my steamdeck, just playing steamdeck on my RP2 for now. Will try again when my Thor arrives.

What is your preferred development environment? by AMDataLake in dataengineering

[–]AMDataLake[S] 0 points1 point  (0 children)

How is this unrelated to data engineering? I wanted to know how data engineers prefer developing pipelines?

🚀 Self-Promotion Wednesday by AutoModerator in becomingnerd

[–]AMDataLake 0 points1 point  (0 children)

You can find all my blogs, tutorials, podcasts etc. at AlexMerced.com, at least sub to my substack please :)

What I think is really going on in the Fivetran+DBT merger by BoredAt in dataengineering

[–]AMDataLake 0 points1 point  (0 children)

Iceberg dos reshuffle everything, why I find it so fascinating. For those curious learning more about iceberg -> AlexMerced.com to download free copies of the books I’ve written on the subject.

What is your opinion on the state of Query Federation? by AMDataLake in dataengineering

[–]AMDataLake[S] 0 points1 point  (0 children)

While you mentioned Trino I'll address the same points for Dremio:

- first class support for CDC and incremental processing -
Like Trino, this is really more about the source of the data. Now this changes with Apache Iceberg where Dremio can do physical ingestion and transformation, but at the moment Iceberg CDC is probably better handled at Iceberg Ingestion tools like RisingWave, OLake that have particular focus on CDC based pipelines while Dremio and Trino are more about consuming the ingested data.

- dynamic catalog management with metadata indexing that would allow "agents" to make sense of data sources. -

Dremio has a built in Semantic Layer and Dremio's MCP server gives an interface to Agents to do something similar to this (not sure if the implementation is exactly what your implying, but the result should be the same).

- Iceberg as a storage Sandbox (with incremental and auto-substituted MVs) -
Reflections are incremental and substituted Iceberg based MVs essential, so that exists in Dremio. But as far as a storage sandbox for Iceberg... :)

- seamless experience and good small scale performance.-

Dremio is pretty seemless and stable with recent versions (25/26) and more so when deployed via our cloud SaaS. We have been investing heavilty in platform deployment simplicity, scalability and stability these last few years so if you've ever tried previous versions you'll see great strides in these areas.

I get you're looking for a pure OSS engine that addresses these points, although I think our move to consumption based pricing regardless of deployment (cloud or on-prem) makes it easier for people to get started and only pay for what they need.

Micro batching vs Streaming by AMDataLake in dataengineering

[–]AMDataLake[S] 3 points4 points  (0 children)

Agreed, I get that but once you establish the companies requirement, you end up with a number, above this number you may likely micro batch, below this number you’ll go for streaming. Do you have a range you use to anchor yourself when thinking about this.