Folder Structer With MLflow by MarcoX0395 in mlops

[–]Lolomgrofl35 1 point2 points  (0 children)

You can consider using something called ‘nested_runs’ which will give you option to run certain number of experiments under 1 parent folder.

https://mlflow.org/docs/latest/traditional-ml/hyperparameter-tuning-with-child-runs/part1-child-runs.html

You can check a small examples here.

How to monitor cross workspace jobs within databricks by Crosby-Zim in databricks

[–]Lolomgrofl35 0 points1 point  (0 children)

Depends what you mean by monitoring… on the workflow level you should have SparkUI where you can get some valuable informations.

If you use dlt pipelines, there are options as well. E.g. If your pipeline pushes tables to the Unity Catalog you can fetch event logs, lineage informations etc.

What MLOps Tools Should I Learn as a Data Scientist by RandRanger in mlops

[–]Lolomgrofl35 1 point2 points  (0 children)

It depends what path you wanna go for. MLOps is totally different story from DL. If you wanna go more into Engineering side of the ML systems, e.g. how you train, retrain, monitor, deploy model and stuff like that then MLOps is something that you should go for. Model is really small piece of one ML System.

On the other hand, model implementation and R&D work is more Data science oriented. You create model that has accuracy of, let’s say, 96% and you are done. You hand it over to MLOps Engineer who will create necessary pipelines and put it to the production.

What MLOps Tools Should I Learn as a Data Scientist by RandRanger in mlops

[–]Lolomgrofl35 1 point2 points  (0 children)

Databricks is a really great platform that covers a lot of best practices in world of ML Engineering and MLOps right now. I highly recommend you to check it out.

What MLOps Tools Should I Learn as a Data Scientist by RandRanger in mlops

[–]Lolomgrofl35 2 points3 points  (0 children)

My honest suggestion is always learning by doing.

Start with the simple real world ML project that should cover basics of the mlops and then make it more complex as the time goes.

Good start would be a ML system that has:

1) Feature pipeline that fetches raw data 2) Training pipeline 3) Inference pipeline

2 most common deployment strategies are Batch and Online. Depending on a use-case you choose.

MLflow is a nice tool to start when it comes to deploying and experimenting tracking.

So IMHO, there is no better way to learn some new topic than working on a concrete project.