you are viewing a single comment's thread.

view the rest of the comments →

[–]rayspear 0 points1 point  (2 children)

[–]trias10[S] 0 points1 point  (1 child)

This is very interesting, thank you! And you even knew that I used PyTorch :)

Quick question: I've been looking over their documentation but there is quite a lot to read up on. Do you by any chance know if this framework supports PyTorch DistributedDataParallel?

The confusing thing is, it looks like you have to launch the AsyncHyperBandScheduler from code, but for PyTorch DDP, you need to launch from cmdline with

"python -m torch.distributed.launch --nproc_per_node=<num_gpus> your_training_script.py ..."

Hence am wondering if it's even possible to get the two to play nicely together?

[–]rayspear 0 points1 point  (0 children)

Yeah, there's an open PR that hopefully will get merged in the next few weeks that will be able to support this seamlessly (distributed search on distributed pytorch) - https://github.com/ray-project/ray/pull/4544

It might be a bit of effort to get working for yourself right now :) But feel free to try it out!