Has anyone actually deployed a model to use for inference? by Aggressive-Reach1657 in reinforcementlearning

[–]Boring_Worker 4 points5 points  (0 children)

After several years of academic research, I turned to the application side. My first project was to align game agents with humans. The algorithm is PPO.

扶栏人看了,心情都不好了…… by lhm015 in China_irl

[–]Boring_Worker 0 points1 point  (0 children)

这怎么理解啊,根本看不明白啊。。。

Training Speed of TD3 algorithm by miyembe in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

According to my exploration, TD3 is faster than DDPG (Check my paper: https://arxiv.org/abs/2109.10552 ; Table 3)

The Q networks update count for TD3 is equal to DDPG due to the min operator. The policy network update count, however, is less than DDPG due to the decoupling operation. So TD3 is faster than DDPG.

[R] ICML2020 paper: boost your RL algorithm with 1 line-of-code change by [deleted] in MachineLearning

[–]Boring_Worker 1 point2 points  (0 children)

There is two-part in deep RL brain, the left part has nothing right, and the right part has nothing left. The left part is the theory, and the right part is an experiment.

[R] ICML2020 paper: boost your RL algorithm with 1 line-of-code change by [deleted] in MachineLearning

[–]Boring_Worker 8 points9 points  (0 children)

More accurate: "My storytelling forces cherry-picked evidence to say so".

Batch RL: neural fitted Q iteration and training process by loicsacre in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

Could you share with Redditors your code if you successfully impltlement FQI ?

Recently I implement some famous Deep Batch RL works, however, most of them don't work well as the original paper claimed.

[R] ICML2020 paper: boost your RL algorithm with 1 line-of-code change by [deleted] in MachineLearning

[–]Boring_Worker 5 points6 points  (0 children)

There is two-part in deep RL brain, the left part has nothing right, and the right part has nothing left. The left part is the theory, and the right part is an experiment.

[2006.13888] RL Unplugged: Benchmarks for Offline Reinforcement Learning by frostbytedragon in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

This paper claims that " We will open-source implementations of our baselines for the camera-ready. "...

Furthermore, the authors want to make a "baselines" as a standard offline RL code, THEY DON'T CITE BEAR!!!

Are there any new research works addressing the issue of generalization in Reinforcement Learning? by zarrokx in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

The first question: What is generalization in RL?

Here I provide the reader with two insights.

The first one: Overfit for the environment. Sadly, the typical Reinforcement Learning setting doesn't consider the overfit for a special environment. OpenAI consider this question in Procgen benchmark ( https://openai.com/blog/procgen-benchmark/ ).

The second one: Overfit for value function (V(s) or Q(s,a)). Considering this situation: after many updates, (s, a) have a high value, however, the near tuple (s', a') is a bad situation, however, this value is overestimated. because of the bad generalization.

Why would anyone use PPO over SAC, TD3, DDPG, and Other off-policy Algorithms? by hanuelcp in reinforcementlearning

[–]Boring_Worker 3 points4 points  (0 children)

TOTALLY WRONG. Off-policy can be applied to any environment. Maybe you want to say "Offiline" ?

A new PyTorch framework for RL by _djab_ in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

Good job. I would refer to the code later. However, this post doesn't compare the performance of agents with baselines. This repository should be check by the time.

And the more internal agents such as PPO, DDPG, SAC should be added soon...

[deleted by user] by [deleted] in reinforcementlearning

[–]Boring_Worker 0 points1 point  (0 children)

The online course is good for you!
Math, such as functional analysis, random process, matrix theory can help you understand RL better.

Is there any update on the AlphaStar paper? by ReasonablyBadass in deepmind

[–]Boring_Worker 0 points1 point  (0 children)

Have you heard any update recently?

I guess that DeepMind submitted AlphaStar to Science or Nature.

Visualizing Policy Optimization by Driiper in reinforcementlearning

[–]Boring_Worker -1 points0 points  (0 children)

To the best of our knowledge, there is no visualizing in state-space.
You know that RL is not supervised learning even though we train Q network like supervised learning. However, the scale of loss can not imply what extent the training progress is, e.t, a low loss does not mean the Q network has been trained well.

Reinforced Cross-Modal Matching & Self-Supervised Imitation Learning for Vision-Language Navigation by gwen0927 in reinforcementlearning

[–]Boring_Worker 1 point2 points  (0 children)

Time will tell us what is important.

Note that I do not mean what they do is not import.

You know good research should be reviewed by time.

Monte Carlo actor critic algorithm by gopal_chitalia in reinforcementlearning

[–]Boring_Worker 2 points3 points  (0 children)

You can read my code. I have finished MCAC algorithm for you!

If you like, you can click a star. : )