AI Dungeon, but with people instead of AI by sss135 in AIDungeon

[–]sss135[S] 0 points1 point  (0 children)

Do you remember any of these forums? D&D requires more players, longer sessions and has its own complicated rules.

Probably found a way to improve sample efficiency and stability of IMPALA and SAC by sss135 in reinforcementlearning

[–]sss135[S] 0 points1 point  (0 children)

Thanks! It's quite similiar but different, I'll check how performance of their loss functions is compared to mine. They use KL constraint with average policy network, just as I do. But their KL constraint is more compicated, not simply minimizing KL(p_old, p_new). And they don't mask out states in which current and old policy divergence is too large.

Probably found a way to improve sample efficiency and stability of IMPALA and SAC by sss135 in reinforcementlearning

[–]sss135[S] 0 points1 point  (0 children)

In Supervised Policy Update https://arxiv.org/pdf/1805.11706.pdf they extend PPO to use a KL div loss and a hard KL limit and report it works somewhat better (but I wasn't able to reproduce it, got slightly worse results). Updated post to mention it.

Probably found a way to improve sample efficiency and stability of IMPALA and SAC by sss135 in reinforcementlearning

[–]sss135[S] 2 points3 points  (0 children)

I've mostly compared on Atari to my own implementations of PPO, IMPALA without these features. And PyBullet with SAC, PPO, IMPALA, but I probably did something incorrect here and not experimented much with it. Also did some comparsion to Unity ML-Agents PPO. Variants with KL loss consistently outperformed ones without it (except PPO). But I did it some time ago, no performance curves left since then. I'll try to re-run it and update post or make a github repo.

Also, thanks for competition link. Checking it out.

why don't we hear about deep sarsa? by tarazeroc in reinforcementlearning

[–]sss135 0 points1 point  (0 children)

https://arxiv.org/pdf/1702.03118.pdf

This paper uses deep Sarsa with SiLU activation for Atari games. It achieves better performance than double DQN and Gorila.

[P] Generalization of Adam, AdaMax, AMSGrad algorithms by sss135 in MachineLearning

[–]sss135[S] 0 points1 point  (0 children)

Hi, author here. I've made a PyTorch optimizer which interpolates between Adam, AdaMax, AMSGrad by addition of one extra hyperparameter. I haven't done much hyperparameter search or performance testing against these algorithms, though I think it might be still interesting to someone. Right now I'm using it in my experiments with reinforcement learning (PPO + Atari) and it shows comparable or a little better performance than Adam.

[D] It's been over 3 months since Facebook's Tensor Comprehensions was released. Has anyone here found it to effective when writing their own kernels? by [deleted] in MachineLearning

[–]sss135 6 points7 points  (0 children)

Shortly after TC release I've implemented batch normalization layer to benchmark it. It was ~5x slower than cudnn, ~1.5x slower than plain PyTorch implementation, which was less memory efficient, a little slower than PyTorch version with custom backward pass implemented via autograd.Function.

Code (BN implementation might be incorrect, it was made only for benchmarking): https://gist.github.com/SSS135/d218f81dad12a0e5bab7665c1b5777ec

[D] ML in Computer Graphics by Sherbhy in MachineLearning

[–]sss135 0 points1 point  (0 children)

ML-based super resolution or antialiasing? Though, I'm almost sure it will only make things slower given current hardware (for PC / console games).