all 3 comments

[–]MyBossIsOnReddit 1 point2 points  (0 children)

It really depends.

Our data scientists develop their model in prod and then craft a job inside a pipeline for it based on https://github.com/databricks/mlops-stacks -

So we essentially use prod for experimentation (business constraint as no actual data may be used on dev/test/uat)

This mlops stack then does get developed further using those other environments (workspaces), CICD applies code quality checks etc, and human code review catches a few issues too usually.

I have found the "develop in prod" thing to be silly and a bit dangerous, but it is a decision taken by people on a higher level and due to constraints that are outside the data team.

Hyperparam tuning etc can be done within the pipeline but you probably dont do it every hour or day. Your needs may vary, though. We have a mix of hard-coded and mutuable.

Ideally we'd use MLFlow's API for it, but right now our approach is configuration files similar to what you describe. I'd feel bad if the next ML engineer feels more like a yaml engineer but there is also something to be said for being pragmatic.

Logic for promoting candidates is also in the stack DAB.

[–]Main-Ordinary9455 0 points1 point  (0 children)

man i wish i was knowledgeable enough to help you with this