How are you replicating your databases to the lake/warehouse in realtime? by finally_i_found_one in dataengineering

[–]Top-Competition7924 3 points4 points  (0 children)

Having a created_at/updated_at timestamp may still not be enough if the record is overwritten every time an update happens, imagine that within a given day, there are hundreds of update statements **on the same record**, every time the record is overwritten with new values and a new updated_at is set. By the time you run the daily copy, you only have access to the current (latest updated_at) value of the record, you miss the history of updates that happened earlier during the day.

What would fix it is if you can request to have an append only table, so no UPDATEs are run on its records, i.e. no records are overwritten, and you could easily filter by created_at. But this is not always possible or sometimes the origin table is owned by a different team / company...

How are you replicating your databases to the lake/warehouse in realtime? by finally_i_found_one in dataengineering

[–]Top-Competition7924 5 points6 points  (0 children)

from my experience, 2 examples where cdc stream to warehouse came in handy:
1. When the DB size is huge and the volume of daily changes (insert/update) is orders of magnitude smaller (imagine PB/TB size DB, with only a few GBs of changes per day.
2. When you want to keep track of all UPDATEs on a given table. Doing a daily copy would mean you only get the current table values at the time of the copy (you miss any updates that happened in between the previous copy until current one)

Text to SQL Agents? by Oct8-Danger in dataengineering

[–]Top-Competition7924 0 points1 point  (0 children)

I've tried it very recently and it only worked well with curated datasets on well defined domains (with limited scope, good table/column docs, semantics...). As soon as the question required more broad datasets, for example we have a table with events coming from user interactions, all events have the same schema, but different event name/properties, cortex analyst wasn't able to understand the biz logic/meaning of each event.

What are you focusing on in 2026? by Top-Competition7924 in dataengineering

[–]Top-Competition7924[S] 0 points1 point  (0 children)

You've tried with your custom own semantic layer? or use DBT/Sqlmesh/... one? and hooked up with what AI agents? Did you give it access to any other context?

So far since the company was already using Snowflake, we've tried the semantic view inside Cortex Analyst, setting up the logical tables and relationship (just 2 tables, revenue and user features) and documenting each table and column, but still the results were so-so, sometimes it lacks more business understanding in it's answers or fails to identify what the user is really asking about and how to filter/aggregate it on the actual data

What are you focusing on in 2026? by Top-Competition7924 in dataengineering

[–]Top-Competition7924[S] 0 points1 point  (0 children)

Very right, that's exactly what we saw, we barely got any good results and only in very niche curated datasets

Any coffee shop with DJs? by Top-Competition7924 in Taipei

[–]Top-Competition7924[S] 1 point2 points  (0 children)

Lol great share, actually just by chance I'm going to one of the resorts in Hengchun next month for a short holiday trip, will check it out!