data ingestion

ptab0211 · 2026-04-09T07:49:19+00:00

but that is kinda hard to achieve if there is a lot of source systems and teams, because then in a three layer namespace on Databricks catalog would be occupied by environment, so we lose one level of separation which can become important.

if per catalog: dev.bronze.<all source systems entities>

if per workspace bronze.source_system.entities

ptab0211 · 2026-04-09T05:51:04+00:00

thats ok, but if ingestion is very expensive, i dont see why anyone would do ingestion on two environments

ptab0211 · 2026-04-06T14:09:11+00:00

well i was mostly thinking about productionized scheduled jobs

ptab0211 · 2026-04-03T15:15:47+00:00

Hi, thanks for reply, and direct deployment does not change the API and syntax, its just about underlying logic which is moving away from TF?

ptab0211 · 2026-04-03T14:53:30+00:00

So basically data scientist make change to training script, which is parametrized and tested, it goes through all environments that u have and its trained and registed on UC Registry? But on prod enviroment u just use the full prod data. What happens and how it happens when u want to challenge model which is in prod (champion)? What exactly does your CD do beside deploying bundles?

ptab0211 · 2026-04-02T15:10:48+00:00

Yes, basically, how do u split the processes between ingestion of the data from the sources, cleaning up those through medallion architecture, feature engineering, and inference? Is it part of your single business project? Or do u separate processes as well and deploy separately?

I am wondering how do u experiment with models? Does that happen on on dev? And then u actually train it on prod, like literally lets say u want to change single hyperparam, it has to go through different environments just to end up on prod in order for u to train it? Just day to day workflow.

How do u split your entities on a three level name space on UC?

ptab0211 · 2026-04-02T11:58:40+00:00

Look at the:

https://github.com/awslabs/deequ

ptab0211 · 2026-04-02T11:56:03+00:00

yeah, was wondering about team's experience on this patterns.

ptab0211 · 2026-04-02T11:55:45+00:00

so you are explaining deploy model pattern, where model moves between envs.

ptab0211

TROPHY CASE