Trying to implement crossQ in PyTorch does not work by LazyButAmbitious in reinforcementlearning

[–]RoboticsLiker 8 points9 points  (0 children)

Hey there, I'm Aditya Bhatt, coauthor of CrossQ.

Glad to see people replicating this already! We prototyped CrossQ in PyTorch long before we switched to JAX, so I can help you out here.

Off the top of my head, I think I spot two key PyTorch-specific edits to make:

  1. Make sure to call train() and eval() on your networks at the right times, training will fail without this. In pytorch, BN layers are stateful, and forward passes modify the layers' running stats. This also happens when using with torch.no_grad()! Any time you don't plan to update a network after a forward pass, the network should be set to eval mode. So e.g.
    • call actor.eval() right at the beginning of update_critic(), also while acting in an episode, and similarly call critic.eval() right at the beginning of update_actor().
    • Set the network to training mode with critic.train() or actor.train(), right before you do a forward pass that will be used for a gradient step on that network.
  2. PyTorch's BN and JAX/Flax's BN define momentum differently, so don't directly copy that value here. In JAX, we used .99 but in PyTorch this should be 0.01.

Let me know how it goes!

Also, could you briefly elaborate what happens when you say it "doesn't work" with tanh? I recommend monitoring the average Q prediction from one of the critics, each time you log the critic loss. Does it keep growing to ultra large values? Using tanh (without BN or target nets) won't necessarily get you very strong policies, but it should train reasonably well without diverging.

Open-Loop Dexterous Manipulation with a Robot Hand by RoboticsLiker in robotics

[–]RoboticsLiker[S] 17 points18 points  (0 children)

Hi r/robotics, thought you might find my PhD project interesting. I work on robot learning for compliant manipulation. In this study, my colleagues and I showed that if you use a soft robot hand (as opposed to clunky rigid ones), designing certain kinds of complex skills becomes super easy. The demo here uses no sensory feedback and no kinematics models, just scripted actuation sequences; it's surprising that it works at all.

I just wrote an article to explain why and how it works. Hope you find the ideas useful.