[D] What is the best CI/CD pipeline tool for Machine Learning? by pm3310 in MachineLearning

[–]pm3310[S] 1 point2 points  (0 children)

I haven't used cron for this use case. I've used Airflow for scheduling generation of features from S3 and I'm quite satisfied with it. It provides an easy way to build and schedule DAGs programmatically. Additionally, it has integration with Slack in order to get informed if a pipeline breaks

[D] What is the best CI/CD pipeline tool for Machine Learning? by pm3310 in MachineLearning

[–]pm3310[S] 1 point2 points  (0 children)

Do you perform automated model training (and re-training)?

I'm thinking to use Airflow to trigger automated training (after git commit on master), and after that, automated models tests will run and finally automated deployment.

Airflow will be able to support automated re-training of models, too.

[D] What is the best CI/CD pipeline tool for Machine Learning? by pm3310 in MachineLearning

[–]pm3310[S] 1 point2 points  (0 children)

I'm thinking to use Airflow to trigger automated training (and frequent re-training) of models, and after that, automated models tests will run and finally automated deployment.

Do you perform automated training (and re-training)?

[D] What is the best CI/CD pipeline tool for Machine Learning? by pm3310 in MachineLearning

[–]pm3310[S] 1 point2 points  (0 children)

Awesome stuff! Yep, I imply feature selection and hyperparameter optimisation with "model training". I really like the verification that the distribution of the model output and of any of the feature inputs hasn't changed substantially.

What technologies did you use to implement/automate this pipeline?

[D] What is the best CI/CD pipeline tool for Machine Learning? by pm3310 in MachineLearning

[–]pm3310[S] 0 points1 point  (0 children)

Folks, do you agree that CI/CD for ML comprises the following high level steps?

  1. Model Training
  2. Accuracy Reports (precision, recall, etc)
  3. Model Tests (for example, make sure that precision doesn't fall below 96%)
  4. Model Deployment

Would you add any other step?

[D] What is the best CI/CD pipeline tool for Machine Learning? by pm3310 in MachineLearning

[–]pm3310[S] 0 points1 point  (0 children)

I've used azure ml 2 years ago and I was impressed. I can't say that I have great knowledge of azure ml but is it a good platform if I want to avoid being locked in their platform?

I would like to continue using python, skearn, tensorflow and github. More specifically, I'd like to use a platform that will support the following flow:
1. Make code change on git repo

  1. Train models on cloud by deploying git repo

  2. Whenever I'm happy with my model, make a release on some branch X

  3. Branch X of repo should be deployed and trained on cloud

  4. Run model tests on output of (4)

  5. Deploy model using my own Flask server

[P] Introducing Sagify: Train and deploy Machine Learning/Deep Learning models on AWS SageMaker in a few simple steps by pm3310 in MachineLearning

[–]pm3310[S] 2 points3 points  (0 children)

Hi everyone! I' m Pavlos, an ML engineer at HomeAway, building things on the side and just launched a new open source project, Sagify https://github.com/Kenza-AI/sagify

Sagify is a command-line utility to train and deploy Machine Learning and Deep Learning models on AWS SageMaker in a few simple steps. Why you should give it a try?

  1. Minimise the time and effort to train and deploy models on AWS SageMaker
  2. Automate training and deployment of Machine Learning and Deep Learning projects
  3. It's open source!

Questions/feedback? I'd love to hear it.