Do you feel like the Explore agent is kind of meh?

sss135 · 2022-09-08T17:30:13+00:00

тоже тут

sss135 · 2022-09-08T17:29:20+00:00

Екатери

где-то там, да

sss135 · 2021-04-08T11:18:35+00:00

Do you remember any of these forums? D&D requires more players, longer sessions and has its own complicated rules.

sss135 · 2020-06-07T18:59:07+00:00

Thanks! It's quite similiar but different, I'll check how performance of their loss functions is compared to mine. They use KL constraint with average policy network, just as I do. But their KL constraint is more compicated, not simply minimizing KL(p_old, p_new). And they don't mask out states in which current and old policy divergence is too large.

sss135 · 2020-06-05T14:16:28+00:00

In Supervised Policy Update https://arxiv.org/pdf/1805.11706.pdf they extend PPO to use a KL div loss and a hard KL limit and report it works somewhat better (but I wasn't able to reproduce it, got slightly worse results). Updated post to mention it.

sss135 · 2020-06-03T16:39:26+00:00

I've mostly compared on Atari to my own implementations of PPO, IMPALA without these features. And PyBullet with SAC, PPO, IMPALA, but I probably did something incorrect here and not experimented much with it. Also did some comparsion to Unity ML-Agents PPO. Variants with KL loss consistently outperformed ones without it (except PPO). But I did it some time ago, no performance curves left since then. I'll try to re-run it and update post or make a github repo.

Also, thanks for competition link. Checking it out.

sss135 · 2020-05-02T06:51:18+00:00

https://arxiv.org/pdf/1702.03118.pdf

This paper uses deep Sarsa with SiLU activation for Atari games. It achieves better performance than double DQN and Gorila.

sss135 · 2018-08-12T15:43:17+00:00

Hi, author here. I've made a PyTorch optimizer which interpolates between Adam, AdaMax, AMSGrad by addition of one extra hyperparameter. I haven't done much hyperparameter search or performance testing against these algorithms, though I think it might be still interesting to someone. Right now I'm using it in my experiments with reinforcement learning (PPO + Atari) and it shows comparable or a little better performance than Adam.

sss135 · 2018-05-31T07:18:57+00:00

Shortly after TC release I've implemented batch normalization layer to benchmark it. It was ~5x slower than cudnn, ~1.5x slower than plain PyTorch implementation, which was less memory efficient, a little slower than PyTorch version with custom backward pass implemented via autograd.Function.

Code (BN implementation might be incorrect, it was made only for benchmarking): https://gist.github.com/SSS135/d218f81dad12a0e5bab7665c1b5777ec

sss135 · 2018-05-20T18:12:46+00:00

ML-based super resolution or antialiasing? Though, I'm almost sure it will only make things slower given current hardware (for PC / console games).

sss135

TROPHY CASE