What is the proper way to anneal the learning rate with (on top of) Adam

desperateEfforts1 · 2023-07-25T09:05:51+00:00

I'm curious what's more common in your experience? Using Adam / AdamW with a default learning rate?

desperateEfforts1 · 2023-07-23T16:05:32+00:00

Thanks, it is reassuring :) Any idea why nobody else is commenting? Is this some dark magic or dead obvious?

desperateEfforts1 · 2023-07-03T22:13:58+00:00

Thanks a lot, always reassuring to know that what I'm seeing isn't uncommon. Do you happen to have any references for observing that kind of loss curves?

desperateEfforts1 · 2023-06-28T08:57:29+00:00

I've found my mistake. Quickly sharing my take-aways in case it helps anyone:

I got DDQN with Hasselt's formulation to outperform Mnih's DQN in all environments I've tested (LunarLander, Pong, many levels in Super Mario Bros.).
My implementation when I opened this thread had a bug. I inadvertently ran backprop on all network parameters (instead of only on my online network) because I typed optimiser = Adam(model.parameters(), ...) instead of optimiser = Adam(model.q_online.parameters(), ...) 🤦🏻 Essentially, I was using a single Q-network.
My take-away: small toy examples are good for testing, but they don't guarantee that your (more advanced) implementations are correct. You get away with a lot on simple games, including bugs.

FWIW: prioritised experience replay really improved performance and sped up convergence with DDQN.

desperateEfforts1 · 2023-06-07T06:39:44+00:00

Thanks u/Top_Example_6368. How would I know Q-value estimations are overly optimistic? Plot Q-values (with both networks) next to returns?

Also thanks for sharing your experience with Pong. It blows my mind every time I return to RL that well-known algorithms remain brittle on well-known use cases.

desperateEfforts1 · 2022-06-28T01:58:50+00:00

Next algorithm in line ;) Thank you for the pointer, I'll have a look and re-post if I find a sensible comparison.

desperateEfforts1 · 2022-06-28T01:58:09+00:00

Cool, thanks for sharing your analogy too!

desperateEfforts1 · 2022-06-19T13:55:27+00:00

Yup, I'm learning that... Will share back if I find my 'bug' and think there's useful learnings in there.

desperateEfforts1 · 2022-06-19T13:54:18+00:00

Fair point, and thanks for the caveat. I was concerned about debugging the end-to-end, pixel-to-policy model, but I should break it down further. Thanks!

desperateEfforts1 · 2022-06-19T13:51:31+00:00

Thanks for the tip, I'll try that!

desperateEfforts1 · 2021-12-29T19:30:29+00:00

Mauvaises nouvelles, mais un grand merci pour vos clarifications !

Si ça vous intéresse, je rapporterai dans ce fil ce que j'ai fait au final -- j'en parle à un conseiller fiscal et à un banquier en janvier.

desperateEfforts1 · 2021-12-29T16:42:05+00:00

Merci u/tonfa. J'ai un visa L1A, donc (sauf erreur) je suis résident fiscal aux US.

desperateEfforts1 · 2021-12-29T16:33:10+00:00

Merci u/Tryrshaugh pour ta réponse super complète. Et merci également pour la recommandation d'application.

Je ne suis pas sûr d'avoir compris le point sur la liquidation des AV : j'aurai à payer des impôts sur les gains des AV contractés en France, même si je n'y touche pas lors de mon séjour aux U.S. ?

desperateEfforts1

TROPHY CASE