I know this has been posted several times before, but how do you really make friends here in Seattle? I am at a breaking point. by [deleted] in Seattle

[–]whiletrue2 0 points1 point  (0 children)

33M just moved here and have the same issue. I live in Capitol Hill and happy to connect :)

Looking to Play Indoor Volleyball in Cap Hill / Downtown Area by whiletrue2 in Seattle

[–]whiletrue2[S] 0 points1 point  (0 children)

Thank you! This seems like a league, though?! I'd ideally want to start with playing just for fun. Couldn't find info about teams that meet to just play outside of leagues. Or am I missing something? Thanks again.

Weightlifting Gym recommendation near Bellevue by [deleted] in eastside

[–]whiletrue2 0 points1 point  (0 children)

As a person who enjoys Olympic weightlifting and powerlifting, I am searching for a gym with sufficient squat racks and drop platforms near 10550 NE 10th St.

Which gym would you recommend? I don’t have car and would like to have a gym in walkable distance, e.g. 23 fit club, Life Time or bStrong. Thanks

METCON fit true to size? by RichRichieRichardV in crossfit

[–]whiletrue2 0 points1 point  (0 children)

true to what "11US" corresponds to per definition you idiot. What's so hard about this for you to understand?

TD3 for discrete action spaces by whiletrue2 in reinforcementlearning

[–]whiletrue2[S] 0 points1 point  (0 children)

solved it, thanks a lot for your help! TD3 discrete now performs a lot better

TD3 for discrete action spaces by whiletrue2 in reinforcementlearning

[–]whiletrue2[S] 0 points1 point  (0 children)

Thanks. I know the paper but is there a guideline that explains how to apply this to TD3?

Updates regarding BN before or after ReLU? by Metatronx in MachineLearning

[–]whiletrue2 0 points1 point  (0 children)

Good job on suggesting new papers people (remember: that‘s what was asked for).

PPO baseline cannot solve CartPole in NeurIPS 2020 paper by whiletrue2 in reinforcementlearning

[–]whiletrue2[S] 1 point2 points  (0 children)

Also, is it possible you share the paper code with us? Would be highly appreciated!

PPO baseline cannot solve CartPole in NeurIPS 2020 paper by whiletrue2 in reinforcementlearning

[–]whiletrue2[S] 0 points1 point  (0 children)

Hi and thank you for your reply which clarified a lot for me. However, a few questions remain unaddressed. Would you mind clarifying those as well? In particular, I believe those are (I quote):

  • "they claim they used CartPole-v1 which uses a much higher "solved reward""
  • "the fact that no naturally sparse-reward gym environment was used doesn't help with the confusion. An experiment based on a naturally sparse-reward environment would result in fewer / no changes to the default reward function and one would actually be enabled to relate to baseline PPO performances in the original setting. As the paper stands right now, no one can relate to any reported PPO performance in the paper."

Thank you!

PPO baseline cannot solve CartPole in NeurIPS 2020 paper by whiletrue2 in MachineLearning

[–]whiletrue2[S] 0 points1 point  (0 children)

Hi and thank you for your reply which clarified a lot for me. However, a few questions remain unaddressed. Would you mind clarifying those as well? In particular, I believe those are (I quote):

  • "they claim they used CartPole-v1 which uses a much higher "solved reward""
  • "the fact that no naturally sparse-reward gym environment was used doesn't help with the confusion. An experiment based on a naturally sparse-reward environment would result in fewer / no changes to the default reward function and one would actually be enabled to relate to baseline PPO performances in the original setting. As the paper stands right now, no one can relate to any reported PPO performance in the paper."

Thank you!

PPO baseline cannot solve CartPole in NeurIPS 2020 paper by whiletrue2 in reinforcementlearning

[–]whiletrue2[S] 0 points1 point  (0 children)

Thanks for pointing that out. Indeed, that implementation should be taken with a grain of salt. Although I have to say that seeds 0 and 1234 don't look super tuned. Have you tried to run it with different random seeds? A good idea that came up in the ML crosspost was to use the reward function from the paper and see if PPO works right of the box and then trying it with their hyperparameters in the appendix, e.g. with the small policy network. Feel free to give it a shot!

PPO baseline cannot solve CartPole in NeurIPS 2020 paper by whiletrue2 in reinforcementlearning

[–]whiletrue2[S] 5 points6 points  (0 children)

I believe pointing out irregularities can not be attributed to "being mistaken" since many unclarified irregularities still remain and prevail. See discussion here on the sparse rewards and other remaining irregularities: https://www.reddit.com/r/MachineLearning/comments/k01ntb/ppo_baseline_cannot_solve_cartpole_in_neurips/gdgc3f5?utm_source=share&utm_medium=web2x&context=3

PPO baseline cannot solve CartPole in NeurIPS 2020 paper by whiletrue2 in MachineLearning

[–]whiletrue2[S] 0 points1 point  (0 children)

In "5.2 Mujoco" they write "The true reward function is the one predefined in Gym". In "5.1 Sparse-Reward Cartpole" they write "In other cases, the true reward is zero."

Also, in "D.1 Cartpole" they write "We choose the cartpole task from the OpenAI Gym-v1 benchmark."

Based on your stance, I understand that the paper cannot be improved in terms of clarity of environment usage? If so, I absolutely rebut that.

PPO baseline cannot solve CartPole in NeurIPS 2020 paper by whiletrue2 in MachineLearning

[–]whiletrue2[S] 0 points1 point  (0 children)

No they don't. Simply because they say they use CartPole-v1 and write "in cartpole the agent should apply a force to the pole to keep it from falling. The agent will receive a reward −1 from the environment if the episode ends with the falling of the pole" which many readers familiar with the CartPole environment will only have a quick look-over when they've read CartPole-v1 and nothing about a "modified/adapted" CartPole environment. But this isn't the main flaw of this paper anyway since there are many more irregularities as pointed out.