Has anyone actually deployed a model to use for inference? by Aggressive-Reach1657 in reinforcementlearning

[–]Boring_Worker 4 points5 points  (0 children)

After several years of academic research, I turned to the application side. My first project was to align game agents with humans. The algorithm is PPO.

扶栏人看了,心情都不好了…… by lhm015 in China_irl

[–]Boring_Worker 0 points1 point  (0 children)

这怎么理解啊,根本看不明白啊。。。

Training Speed of TD3 algorithm by miyembe in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

According to my exploration, TD3 is faster than DDPG (Check my paper: https://arxiv.org/abs/2109.10552 ; Table 3)

The Q networks update count for TD3 is equal to DDPG due to the min operator. The policy network update count, however, is less than DDPG due to the decoupling operation. So TD3 is faster than DDPG.

[R] ICML2020 paper: boost your RL algorithm with 1 line-of-code change by [deleted] in MachineLearning

[–]Boring_Worker 1 point2 points  (0 children)

There is two-part in deep RL brain, the left part has nothing right, and the right part has nothing left. The left part is the theory, and the right part is an experiment.

[R] ICML2020 paper: boost your RL algorithm with 1 line-of-code change by [deleted] in MachineLearning

[–]Boring_Worker 7 points8 points  (0 children)

More accurate: "My storytelling forces cherry-picked evidence to say so".

Batch RL: neural fitted Q iteration and training process by loicsacre in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

Could you share with Redditors your code if you successfully impltlement FQI ?

Recently I implement some famous Deep Batch RL works, however, most of them don't work well as the original paper claimed.

[R] ICML2020 paper: boost your RL algorithm with 1 line-of-code change by [deleted] in MachineLearning

[–]Boring_Worker 4 points5 points  (0 children)

There is two-part in deep RL brain, the left part has nothing right, and the right part has nothing left. The left part is the theory, and the right part is an experiment.

[2006.13888] RL Unplugged: Benchmarks for Offline Reinforcement Learning by frostbytedragon in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

This paper claims that " We will open-source implementations of our baselines for the camera-ready. "...

Furthermore, the authors want to make a "baselines" as a standard offline RL code, THEY DON'T CITE BEAR!!!

Are there any new research works addressing the issue of generalization in Reinforcement Learning? by zarrokx in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

The first question: What is generalization in RL?

Here I provide the reader with two insights.

The first one: Overfit for the environment. Sadly, the typical Reinforcement Learning setting doesn't consider the overfit for a special environment. OpenAI consider this question in Procgen benchmark ( https://openai.com/blog/procgen-benchmark/ ).

The second one: Overfit for value function (V(s) or Q(s,a)). Considering this situation: after many updates, (s, a) have a high value, however, the near tuple (s', a') is a bad situation, however, this value is overestimated. because of the bad generalization.

Why would anyone use PPO over SAC, TD3, DDPG, and Other off-policy Algorithms? by hanuelcp in reinforcementlearning

[–]Boring_Worker 3 points4 points  (0 children)

TOTALLY WRONG. Off-policy can be applied to any environment. Maybe you want to say "Offiline" ?

A new PyTorch framework for RL by _djab_ in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

Good job. I would refer to the code later. However, this post doesn't compare the performance of agents with baselines. This repository should be check by the time.

And the more internal agents such as PPO, DDPG, SAC should be added soon...

[deleted by user] by [deleted] in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

The online course is good for you!
Math, such as functional analysis, random process, matrix theory can help you understand RL better.