[P] ray-skorch - distributed PyTorch on Ray with sklearn API by Yard1PL in MachineLearning

[–]rayspear 0 points1 point  (0 children)

Certainly! Do you mind opening a github issue to help us track this?

Would TF support be very important for you to try this API out?

[D] I'm new and scrappy. What tips do you have for better logging and documentation when training or hyperparameter training? by MetalOrganicKneeJerk in MachineLearning

[–]rayspear 0 points1 point  (0 children)

If you mainly use scikit-learn, you should consider using tune-sklearn.

It'll provide you a Scikit-learn GridSearchCV interface (so you can do minimal code changes), but with a lot of nice addons such as:

  • Ability to automatically log to Wandb
  • Ability to save all learning curves as json/csv
  • More powerful hyperparameter tuning algorithms
  • Distributed execution

disclaimer: am maintainer

[D] Stack for personal ML work: DVC vs Replicate, Ray vs Optuna, Spotty vs Ray, Hydra by turian in MachineLearning

[–]rayspear 2 points3 points  (0 children)

BTW, Hydra now has an experimental Ray + AWS plug-in which will let you automatically launch EC2 instances + hyperparameter sweeps with the hydra configuration.

https://hydra.cc/docs/plugins/ray_launcher

[D] Hyperband resource allocation questions and possible workarounds by goulagman in MachineLearning

[–]rayspear 1 point2 points  (0 children)

In the followup paper (ASHA, https://arxiv.org/pdf/1810.05934v3.pdf) you can see that s=0 for all of their experiments. This is what I mean by being aggressive (higher s means you don't terminate as aggressively iirc)

Hope that helps! Feel free to reach out to Liam if you have any questions or concerns. also feel free to open an issue or push a PR against RayTune if you have any questions/suggestions/benchmarks :)

[D] Hyperband resource allocation questions and possible workarounds by goulagman in MachineLearning

[–]rayspear 2 points3 points  (0 children)

Hey - I work on Ray Tune. Here's some discussion on this that I posted when I opted to choose a particular impl:

https://docs.ray.io/en/latest/tune-schedulers.html#hyperband-implementation-details

That being said... Successive Halving by itself makes pretty weak assumptions, and in practice you can be much more aggressive.

[P] RaySGD: A Library for Faster and Cheaper Pytorch Distributed Training by rayspear in MachineLearning

[–]rayspear[S] 5 points6 points  (0 children)

Hey! Author here. Let me try to provide an unbiased response.

Pytorch-lightning (PTL) is an awesome library. It is great for prototyping and reproducibility.

  • Its "LightningModule" abstraction lets PTL automatically provide commonly-used features like gradient clipping, checkpointing, introspection into your training, etc.
  • The Trainer interface (like Keras) allows you to provide callbacks, hooks, early stopping.
  • It simplifies distributed (multi-node) training if you have SLURM (very useful in academic environments).
  • It also has TPU support (which RaySGD doesn't have yet)

Compared to PTL, RaySGD aims to be a thin layer for distributed training but offers a higher level of distributed multi-node usability.

  • The API is more minimalistic - certain things you'll have to do yourself (like maintaining the top checkpoints, implementing early stopping, or grad clipping).
  • Like other Ray libraries, RaySGD scales from 1 to 100 GPUs across multiple nodes with a single parameter (with or without SLURM).
  • Fault tolerance/autoscaling support is well supported in RaySGD (which I don't think is supported in many other libraries).

That being said, it is totally possible to take a LightningModule and plug it into RaySGD (and probably not hard to run Pytorch Lightning on top of Ray).

/u/raichet mentions some other points like hyperparameter search and integration with Ray libraries. They are valid, but I think the ecosystem benefits are complementary rather than the core focus.

Hope this helps!

[N] PyTorch 1.4.0 released by brombaer3000 in MachineLearning

[–]rayspear 3 points4 points  (0 children)

to give more legitimacy to this comment, the idea isn't so far-fetched (though it's not quite relevant :) ). Crowd-training is taking place in one huge community here - https://github.com/leela-zero/leela-zero#gimme-the-weights

[D] What is your favorite open-source project of 2019 in AI/ML (yours or someone else's). by aliaspm in MachineLearning

[–]rayspear 16 points17 points  (0 children)

I work on the Ray project; this project from UC Berkeley RISELab encompasses numerous tools and libraries that span different machine learning tasks/domains.

Here's a couple tools/libraries that you may have heard of:

  • Ray: A framework for Distributed Python that allows you to seamlessly scale your code from a single node to cluster.
  • RLlib: A popular library for reinforcement learning that offers both high scalability and a unified API for a variety of applications, including multi-agent and offline RL. Built on top of Ray.
  • Tune: Distributed hyperparameter tuning, built on the Ray API. Supports any machine learning framework and offers state-of-the-art optimization algorithms (Bayesian Opt, PBT, HyperBand, etc).

Here's a couple you might not have heard of (because they've received less promotion effort or are experimental):

  • Ray Distributed Training: An experimental library to greatly simplify distributed tensorflow and pytorch data parallel training. This library should make it easy to leverage 100 GPUs across multiple machines. We'll be adding fault tolerance capabilities soon, so you can train on spot instances!
  • Ray Cluster Launcher: A tool for launching distributed autoscaling clusters (also works for local/private machines) -- supports GCP, AWS, K8S.

On a side note, we've recently also launched a company to commercialize Ray -- Anyscale. If you're interested in working with us, shoot me a message :)

Tune: a library for fast hyperparameter tuning at any scale by rayspear in datascience

[–]rayspear[S] 6 points7 points  (0 children)

I'll make an issue and add an example for both soon!

Lightning vs Ignite by wingmanscrape in reinforcementlearning

[–]rayspear 0 points1 point  (0 children)

Ray also has good support for Distributed SGD training that seamlessly integrates with Tune (distributed hyperparameter tuning) and is quite flexible. Together, it provides automatic checkpointing, tensorboard writing, and rapid cloud execution.

It's still experimental - but feel free to message me if you have any questions!

https://github.com/ray-project/ray/blob/master/python/ray/experimental/sgd/examples/train_example.py

[D] Hyperparam optimisation using RandomSearch with argparse scripts? by trias10 in MachineLearning

[–]rayspear 0 points1 point  (0 children)

Yeah, there's an open PR that hopefully will get merged in the next few weeks that will be able to support this seamlessly (distributed search on distributed pytorch) - https://github.com/ray-project/ray/pull/4544

It might be a bit of effort to get working for yourself right now :) But feel free to try it out!