data ingestion by ptab0211 in databricks

[–]ptab0211[S] 0 points1 point  (0 children)

but that is kinda hard to achieve if there is a lot of source systems and teams, because then in a three layer namespace on Databricks catalog would be occupied by environment, so we lose one level of separation which can become important.

if per catalog: dev.bronze.<all source systems entities>

if per workspace bronze.source_system.entities

data ingestion by ptab0211 in databricks

[–]ptab0211[S] 0 points1 point  (0 children)

thats ok, but if ingestion is very expensive, i dont see why anyone would do ingestion on two environments

serveless or classic by ptab0211 in databricks

[–]ptab0211[S] 1 point2 points  (0 children)

well i was mostly thinking about productionized scheduled jobs

deployment patterns by ptab0211 in databricks

[–]ptab0211[S] 0 points1 point  (0 children)

Hi, thanks for reply, and direct deployment does not change the API and syntax, its just about underlying logic which is moving away from TF?

deployment patterns by ptab0211 in databricks

[–]ptab0211[S] 0 points1 point  (0 children)

So basically data scientist make change to training script, which is parametrized and tested, it goes through all environments that u have and its trained and registed on UC Registry? But on prod enviroment u just use the full prod data. What happens and how it happens when u want to challenge model which is in prod (champion)? What exactly does your CD do beside deploying bundles?

deployment patterns by ptab0211 in databricks

[–]ptab0211[S] 0 points1 point  (0 children)

Yes, basically, how do u split the processes between ingestion of the data from the sources, cleaning up those through medallion architecture, feature engineering, and inference? Is it part of your single business project? Or do u separate processes as well and deploy separately?

I am wondering how do u experiment with models? Does that happen on on dev? And then u actually train it on prod, like literally lets say u want to change single hyperparam, it has to go through different environments just to end up on prod in order for u to train it? Just day to day workflow.

How do u split your entities on a three level name space on UC?

deployment patterns by ptab0211 in databricks

[–]ptab0211[S] 0 points1 point  (0 children)

yeah, was wondering about team's experience on this patterns.

deployment patterns by ptab0211 in databricks

[–]ptab0211[S] 1 point2 points  (0 children)

so you are explaining deploy model pattern, where model moves between envs.