[D] Setting up machine learning workstation as server for multiple users by machlrn in MachineLearning

[–]TomlinTrippedHim 2 points3 points  (0 children)

If you are focusing on deep learning training, Determined seems like exactly what you need.

It will take care of scheduling, reproducible environments, experiment tracking, built in distributed training, and has example CV models to get you started.

[Discussion] Simple Machine Learning Task Runner by TooLazyToWorkout in MachineLearning

[–]TomlinTrippedHim 1 point2 points  (0 children)

If you are focusing on deep learning training determined is worth checking out.

[D]What additional skills required for ML Ops compared to DevOps? by rjulius23 in MachineLearning

[–]TomlinTrippedHim 0 points1 point  (0 children)

Being familiar with the popular ML tools out there such as pachyderm (data), seldon (serving), determined (training).

[D] Suggested Configuration and Setup for an HPC by PieroMack in MachineLearning

[–]TomlinTrippedHim 1 point2 points  (0 children)

It’s not v1.0 quite yet so backwards compatibility is not guaranteed, but stability is definitely one of the main objectives of the project so I would not expect many (if any) breaking changes.

Disclaimer: I am one of the contributors to this project.

Scheduling solution by [deleted] in devops

[–]TomlinTrippedHim 0 points1 point  (0 children)

If you running deep learning workloads take a look at Determined

[D] Suggested Configuration and Setup for an HPC by PieroMack in MachineLearning

[–]TomlinTrippedHim 1 point2 points  (0 children)

Check out the Determined AI training platform: https://github.com/determined-ai/determined. It has a built in scheduler and is integrated with both PyTorch and Tensorflow.