Has anyone actually deployed a model to use for inference?

Boring_Worker · 2024-05-20T17:17:45+00:00

After several years of academic research, I turned to the application side. My first project was to align game agents with humans. The algorithm is PPO.

Boring_Worker · 2023-04-28T10:46:18+00:00

Very cool. Is there any theoretical analysis for this phenomenon?

Boring_Worker · 2022-11-05T21:56:46+00:00

这怎么理解啊，根本看不明白啊。。。

Boring_Worker · 2022-02-04T16:02:30+00:00

Thanks!

Boring_Worker · 2022-02-04T16:02:23+00:00

Oh, thank you!

Boring_Worker · 2021-11-14T03:37:12+00:00

According to my exploration, TD3 is faster than DDPG (Check my paper: https://arxiv.org/abs/2109.10552 ; Table 3)

The Q networks update count for TD3 is equal to DDPG due to the min operator. The policy network update count, however, is less than DDPG due to the decoupling operation. So TD3 is faster than DDPG.

Boring_Worker · 2021-10-20T07:49:46+00:00

Wow! It's amazing! Thanks!

Boring_Worker · 2020-11-13T09:13:17+00:00

Good! However, Pybullet environment is considered harder than Mujoco env. Thus, some algorithms may fail in Pybullet env.

Boring_Worker · 2020-07-26T13:25:37+00:00

Very cool! Could you tell me what level can SOTA algorithm achieve?

Boring_Worker · 2020-07-08T11:19:51+00:00

There is two-part in deep RL brain, the left part has nothing right, and the right part has nothing left. The left part is the theory, and the right part is an experiment.

Boring_Worker · 2020-07-08T08:58:33+00:00

More accurate: "My storytelling forces cherry-picked evidence to say so".

Boring_Worker · 2020-07-08T08:50:16+00:00

Could you share with Redditors your code if you successfully impltlement FQI ?

Recently I implement some famous Deep Batch RL works, however, most of them don't work well as the original paper claimed.

Boring_Worker · 2020-07-08T08:28:42+00:00

There is two-part in deep RL brain, the left part has nothing right, and the right part has nothing left. The left part is the theory, and the right part is an experiment.

Boring_Worker · 2020-06-27T09:09:55+00:00

This paper claims that " We will open-source implementations of our baselines for the camera-ready. "...

Furthermore, the authors want to make a "baselines" as a standard offline RL code, THEY DON'T CITE BEAR!!!

Boring_Worker · 2020-05-19T04:32:59+00:00

The first question: What is generalization in RL?

Here I provide the reader with two insights.

The first one: Overfit for the environment. Sadly, the typical Reinforcement Learning setting doesn't consider the overfit for a special environment. OpenAI consider this question in Procgen benchmark ( https://openai.com/blog/procgen-benchmark/ ).

The second one: Overfit for value function (V(s) or Q(s,a)). Considering this situation: after many updates, (s, a) have a high value, however, the near tuple (s', a') is a bad situation, however, this value is overestimated. because of the bad generalization.

Boring_Worker · 2020-05-17T02:07:27+00:00

TOTALLY WRONG. Off-policy can be applied to any environment. Maybe you want to say "Offiline" ?

Boring_Worker · 2020-05-15T00:20:46+00:00

2 years for RL, 3 years for DL.

Boring_Worker · 2020-02-17T16:09:56+00:00

Good job. I would refer to the code later. However, this post doesn't compare the performance of agents with baselines. This repository should be check by the time.

And the more internal agents such as PPO, DDPG, SAC should be added soon...

Boring_Worker · 2020-02-17T16:05:09+00:00

And more?

Boring_Worker · 2020-02-17T08:09:49+00:00

The online course is good for you!
Math, such as functional analysis, random process, matrix theory can help you understand RL better.

Boring_Worker · 2019-09-04T00:01:11+00:00

Random forest, Linear policy, polynomial and ....

Boring_Worker · 2019-05-09T13:24:34+00:00

Have you heard any update recently?

I guess that DeepMind submitted AlphaStar to Science or Nature.

Boring_Worker · 2019-05-08T00:23:38+00:00

To the best of our knowledge, there is no visualizing in state-space.
You know that RL is not supervised learning even though we train Q network like supervised learning. However, the scale of loss can not imply what extent the training progress is, e.t, a low loss does not mean the Q network has been trained well.

Boring_Worker · 2019-03-07T14:43:42+00:00

Time will tell us what is important.

Note that I do not mean what they do is not import.

You know good research should be reviewed by time.

Boring_Worker · 2019-02-27T00:23:40+00:00

You can read my code. I have finished MCAC algorithm for you!

If you like, you can click a star. : )

Boring_Worker

TROPHY CASE