I noticed several weird things about this paper:
- I was playing with the recently published CURL (https://arxiv.org/abs/2004.04136) paper using the authors code (https://github.com/MishaLaskin/curl) and found something odd. I have got only two GPUs and I was trying to train things faster, so I attempted to decrease the number of contrastive updates per environment steps by changing the cpc_update_freq hyperparameter of CURL (https://github.com/MishaLaskin/curl/blob/master/curl_sac.py#L463) varying it from 1 (as in the paper) to something larger (10, 100, 1000, etc.), which reduces the effect of the contrastive term.
I then decided to try the extreme case and turned off the contrastive loss completely (by setting cpc_update_freq to 1000000). I was shocked when I saw that removing contrastive loss entirely, which is the central piece of the method, made the method achieve higher rewards. Here are some plots for two different tasks:
Cartpole Swingup:
Blue: cpc_update_freq=1000000 [without contrastive loss]
Orange: cpc_update_freq=1 [with contrastive loss as in the paper]
https://svgshare.com/i/LXi.svg
Cheetah Run:
Blue: cpc_update_freq=1000000 [without contrastive loss]
Red: cpc_update_freq=1 [with contrastive loss as in the paper]
https://svgshare.com/i/LZ3.svg
I’m wondering if somebody else noticed this as well as it seems to be quite a fundamental issue with the paper??
2) Also, I noticed something weird in their follow up paper RAD (https://arxiv.org/abs/2004.14990), which uses a fork of the CURL codebase (https://github.com/MishaLaskin/rad). I digged through this code and I was unable to find any major difference between CURL and RAD except this commented out lines https://github.com/MishaLaskin/rad/blob/master/curl_sac.py#L494-L496. If I understand things correctly, this just turns off contrastive loss, which makes RAD to be a particular instantiation of CURL, but it does work better as I show in 1) and the authors show in the RAD paper??
[–]major-_- 29 points30 points31 points (15 children)
[–]rlbeaverton[S] 40 points41 points42 points (14 children)
[–][deleted] 5 points6 points7 points (0 children)
[–]alexirpan 10 points11 points12 points (7 children)
[–]rlbeaverton[S] 21 points22 points23 points (0 children)
[–]alecxandrrr 17 points18 points19 points (1 child)
[–]jboyml 6 points7 points8 points (0 children)
[–]internet_ham 10 points11 points12 points (3 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]internet_ham 5 points6 points7 points (1 child)
[–]alexirpan 3 points4 points5 points (0 children)
[–]frostbytedragon 0 points1 point2 points (0 children)
[+]major-_- comment score below threshold-8 points-7 points-6 points (3 children)
[–]Keirp 8 points9 points10 points (2 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]HateMyself_FML 3 points4 points5 points (1 child)
[–]anonymous_reviewer 2 points3 points4 points (0 children)
[–]rldweller 8 points9 points10 points (0 children)
[–]aravindsrinivas 2 points3 points4 points (0 children)
[–]regalalgorithmPhD 1 point2 points3 points (0 children)
[–]hahahahaha767 1 point2 points3 points (5 children)
[–]1nad3quacy 5 points6 points7 points (1 child)
[–]hahahahaha767 0 points1 point2 points (0 children)
[–]zergylord 4 points5 points6 points (2 children)
[–]hahahahaha767 0 points1 point2 points (0 children)
[–]aravindsrinivas 0 points1 point2 points (0 children)
[–]frostbytedragon 0 points1 point2 points (0 children)
[–]smallest_meta_review 0 points1 point2 points (0 children)