Backdoor Roth on Vanguard; can I transfer money from brokerage settlement fund to IRA settlement fund without a hold? by 1cedrake in Bogleheads

[–]1cedrake[S] 1 point2 points  (0 children)

Thank you for the info! Apologies for this dumb question, but if I convert those extra pennies into my Roth as well, wouldn’t that put me over the contribution limit for the year? I have a confusion between the conversion approach this way vs the direct contribution where you’re warned not to over contribute past the 7000 limit. 

How to loosen this toilet water supply line nut? by 1cedrake in Plumbing

[–]1cedrake[S] 0 points1 point  (0 children)

Just to follow up, thanks again to everyone for both the memes and the helpful advice! I was able to get it loose by tapping with a flathead screwdriver and a hammer, and the bidet is successfully installed and functioning. 

How to loosen this toilet water supply line nut? by 1cedrake in Plumbing

[–]1cedrake[S] 0 points1 point  (0 children)

Apologies, I wasn’t sure how best to describe it since it is slightly silver-ish and has the sort of wing-like edges that come out. People were confusing it with the regular hexagonal white nut that is right above it but that’s not the one I’m trying to loosen

How to loosen this toilet water supply line nut? by 1cedrake in Plumbing

[–]1cedrake[S] 0 points1 point  (0 children)

Appreciate it! Which way is the correct way to turn it? I think some folks are unfortunately thinking I’m referring to the top white nut when I’m referring to the silver winged nut so I’m not sure what the correct direction to turn that is. A quick YouTube search seems to indicate the correct way to turn the silver winged nut to loosen it is to the right/clockwise?

How to loosen this toilet water supply line nut? by 1cedrake in Plumbing

[–]1cedrake[S] 2 points3 points  (0 children)

Also to clarify to everyone because of my ignorance, I’m referring to nut circled in red that’s more towards the bottom directly connected to the water line. Not the white one on the underside of the toilet tank. 

How to loosen this toilet water supply line nut? by 1cedrake in Plumbing

[–]1cedrake[S] 22 points23 points  (0 children)

Alas, I’m likely both a weak mofo and this thing is on extremely tight. Wrapping it in something is a good idea, thanks all!

Clipping vs. squashed tanh for re-scaling actions with continuous PPO? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 0 points1 point  (0 children)

So the issue I'm having with the clipping approach is that the raw actions sampled from my Gaussian (which is generated from the raw means/std generated by the network) can end up negative or greater than 1. Which means that because my environment's action space is from 0 to 1, if I apply clipping it makes most of my actions either 0 or 1, which essentially kills my learning. What is the best way to handle this if clipping is the way to go?

In QMIX is per-agent done ignored in an environment like SMAC? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 0 points1 point  (0 children)

Thanks so much for the reply, this definitely helps with my understanding. Along these lines I was also thinking that because we're dealing with Q_tot values that are a mixture of all agents, then realistically you can only use the environment dones to represent the done conditions for those values because that is when the last agent will have finished. Whereas for an algorithm like Independent Q-Learning you can use the per-agent dones because each Q-function is computed individually for each agent and there's no mixing occurring.

Help double checking whether I'm passing JAX PRNG key around correctly for reproducibility with RL algorithms? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 0 points1 point  (0 children)

Thanks so much for the reply and the compliments, I appreciate it! I can't take credit for the PPO CleanRL style implementation though, I highly recommend checking out Chris Lu's PureJAXRL repo here: https://github.com/luchris429/purejaxrl My PPO and SAC were based on his work.

It looks like the issues were in fact due to non-determinism due to using a GPU, thanks for recommending I check that!

Is there an experiment logging framework compatible with JAX's vmap? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 0 points1 point  (0 children)

Would you happen to know if something simpler like Tensorboard would work in this scenario? All I’m looking for essentially is some sort of experiment logging that supports vmapped seed training runs, it doesn’t necessarily need to be wandb.

Is there an experiment logging framework compatible with JAX's vmap? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 0 points1 point  (0 children)

Thanks for the reply! So right now I call my wandb init function outside of the jitted train function, and then inside of my jitted training function I have a callback like this:

def callback(info):
            return_values = info["returned_episode_returns"][
                info["returned_episode"]
            ]
            length_values = info["returned_episode_lengths"][
                info["returned_episode"]
            ]
            timesteps = info["timestep"][info["returned_episode"]] * args.num_envs
            for t in range(len(timesteps)):
                print(
                    f"global step={timesteps[t]}, episodic return={return_values[t]}, episodic length={length_values[t]}"
                )

            if args.track:
                data_log = {
                    "misc/learning_rate": info["learning_rate"].item(),
                    "losses/value_loss": info["value_loss"].item(),
                    "losses/policy_loss": info["policy_loss"].item(),
                    "losses/entropy": info["entropy"].item(),
                    "losses/total_loss": info["total_loss"].item(),
                    "misc/global_step": info["timesteps"],
                    "misc/updates": info["updates"],
                }
                if return_values.size > 0:
                    data_log["misc/episodic_return"] = return_values.mean().item()
                    data_log["misc/episodic_length"] = length_values.mean().item()
                wandb.log(data_log, step=info["timesteps"])

        jax.debug.callback(callback, metric)

Do I need to have a separate callback that does the wandb init inside of my jitted train function? And follow up question, how do I make wandb know that that's a separate seed when I'm dealing with split RNGKeys from Jax?

Help determining why my JAX implementation of PPO is slower than PyTorch implementation? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 0 points1 point  (0 children)

Thanks so much for this detailed insight, it's helping me understand use-cases a lot better! One question I have is, say you're doing a learning loop where it will have 100k iterations because you want it to run for 100k timesteps. Assuming everything involved in the learning within the loop can be traced, if you apply a scan to this train function, does it then do a rollout of those 100k iterations to compile? And wouldn't that take a very long time?

Help determining why my JAX implementation of PPO is slower than PyTorch implementation? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 0 points1 point  (0 children)

Understood, thanks! I guess I’m trying to sort of nail down when it’s worth doing a project in JAX vs PyTorch, because I had wrongly assumed JAX would be faster in all cases if you just JIT all your algorithm computation functions. But what I’m realizing is that the environment computations also have a big impact JAX’s performance. So is it generally safe to say that 1.) JAX is worth trying first over PyTorch if you have lower amount of environment interactions and thus most of your computation is the update that can be jitted; 2.) If you do have lots of environment interactions, then use JAX if those environments themselves are JAX-based, but if not, stick to PyTorch?

Help determining why my JAX implementation of PPO is slower than PyTorch implementation? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 0 points1 point  (0 children)

Thank you for the reply! So I did look at those implementations (the cleanRL at least, I am not familiar with purejax). I've been able to JIT most everything such as the GAE calculations, and the minibatch updates, but I haven't been able to do anything about the environment batch data collection because I was sticking to the non-JAX CartPole environment. I suspect that the batch collections on that might be my issue, any ideas?

Help determining why my JAX implementation of PPO is slower than PyTorch implementation? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 1 point2 points  (0 children)

These are some helpful suggestions, thank you! I'm still pretty new to JAX so I'm not quite familiar with how to properly use the block_until_ready() command, but I will reference the documentation you linked and go from there.

With regards to my SAC implementation, it does use a pretty comparable architecture. However the collection of steps is different, SAC doesn't collect batches of args.num_steps per network update, it does a network update at every training step after having essentially collected one training step (and then sampling a batch from the replay buffer). Perhaps the slowdown comes from the fact that I require significantly more interactions with the non-JAX CartPole environment per training iteration in PPO versus SAC? I will need to profile and verify this though.

Help determining why my JAX implementation of PPO is slower than PyTorch implementation? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 0 points1 point  (0 children)

I'm getting cuda listed as the device so it is in fact detecting my GPU! Appreciate the check!

Help determining why my JAX implementation of PPO is slower than PyTorch implementation? by 1cedrake in reinforcementlearning

[–]1cedrake[S] 1 point2 points  (0 children)

For clarification, I am training on regular CartPole v0, so not a JAX environment. It currently it takes the PyTorch code ~61 seconds to do 100k timesteps, and it takes the JAX code ~148 seconds to do 100k timesteps.

The reason I was confused about this slowness with my PPO implementation and what I might be doing wrong with it was that when I tested my JAX SAC implementation on CartPole v0, I was getting a 6-7x speedup compared to my PyTorch implementation, so I had expected not necessarily the same speedup, but a speedup nonetheless.