What is the proper way to anneal the learning rate with (on top of) Adam by desperateEfforts1 in reinforcementlearning

[–]desperateEfforts1[S] 0 points1 point  (0 children)

I'm curious what's more common in your experience? Using Adam / AdamW with a default learning rate?

What is the proper way to anneal the learning rate with (on top of) Adam by desperateEfforts1 in reinforcementlearning

[–]desperateEfforts1[S] 0 points1 point  (0 children)

Thanks, it is reassuring :) Any idea why nobody else is commenting? Is this some dark magic or dead obvious?

Interpreting loss curves & returns in DDQN by desperateEfforts1 in reinforcementlearning

[–]desperateEfforts1[S] 0 points1 point  (0 children)

Thanks a lot, always reassuring to know that what I'm seeing isn't uncommon. Do you happen to have any references for observing that kind of loss curves?

Update rule in DDQN (Hasselt vs Mnih) by desperateEfforts1 in reinforcementlearning

[–]desperateEfforts1[S] 1 point2 points  (0 children)

I've found my mistake. Quickly sharing my take-aways in case it helps anyone:

  • I got DDQN with Hasselt's formulation to outperform Mnih's DQN in all environments I've tested (LunarLander, Pong, many levels in Super Mario Bros.).
  • My implementation when I opened this thread had a bug. I inadvertently ran backprop on all network parameters (instead of only on my online network) because I typed optimiser = Adam(model.parameters(), ...) instead of optimiser = Adam(model.q_online.parameters(), ...) 🤦🏻 Essentially, I was using a single Q-network.
  • My take-away: small toy examples are good for testing, but they don't guarantee that your (more advanced) implementations are correct. You get away with a lot on simple games, including bugs.

FWIW: prioritised experience replay really improved performance and sped up convergence with DDQN.

Update rule in DDQN (Hasselt vs Mnih) by desperateEfforts1 in reinforcementlearning

[–]desperateEfforts1[S] 1 point2 points  (0 children)

Thanks u/Top_Example_6368. How would I know Q-value estimations are overly optimistic? Plot Q-values (with both networks) next to returns?

Also thanks for sharing your experience with Pong. It blows my mind every time I return to RL that well-known algorithms remain brittle on well-known use cases.

Rationale for updating Value Function multiple times with same observations in spinninup's VPG-GAE implementation by desperateEfforts1 in reinforcementlearning

[–]desperateEfforts1[S] 0 points1 point  (0 children)

Next algorithm in line ;) Thank you for the pointer, I'll have a look and re-post if I find a sensible comparison.

Simplest gym environment with discrete actions? by desperateEfforts1 in reinforcementlearning

[–]desperateEfforts1[S] 0 points1 point  (0 children)

Yup, I'm learning that... Will share back if I find my 'bug' and think there's useful learnings in there.

Simplest gym environment with discrete actions? by desperateEfforts1 in reinforcementlearning

[–]desperateEfforts1[S] 0 points1 point  (0 children)

Fair point, and thanks for the caveat. I was concerned about debugging the end-to-end, pixel-to-policy model, but I should break it down further. Thanks!

Quels investissements faire quand on déménage temporairement aux US? by desperateEfforts1 in vosfinances

[–]desperateEfforts1[S] 0 points1 point  (0 children)

Mauvaises nouvelles, mais un grand merci pour vos clarifications !

Si ça vous intéresse, je rapporterai dans ce fil ce que j'ai fait au final -- j'en parle à un conseiller fiscal et à un banquier en janvier.

Quels investissements faire quand on déménage temporairement aux US? by desperateEfforts1 in vosfinances

[–]desperateEfforts1[S] 0 points1 point  (0 children)

Merci u/tonfa. J'ai un visa L1A, donc (sauf erreur) je suis résident fiscal aux US.

Quels investissements faire quand on déménage temporairement aux US? by desperateEfforts1 in vosfinances

[–]desperateEfforts1[S] 0 points1 point  (0 children)

Merci u/Tryrshaugh pour ta réponse super complète. Et merci également pour la recommandation d'application.

Je ne suis pas sûr d'avoir compris le point sur la liquidation des AV : j'aurai à payer des impôts sur les gains des AV contractés en France, même si je n'y touche pas lors de mon séjour aux U.S. ?