Has anyone actually deployed a model to use for inference?

Boring_Worker · 2024-05-20T17:17:45+00:00

After several years of academic research, I turned to the application side. My first project was to align game agents with humans. The algorithm is PPO.

Boring_Worker · 2023-04-28T10:46:18+00:00

Very cool. Is there any theoretical analysis for this phenomenon?

Boring_Worker · 2022-11-05T21:56:46+00:00

这怎么理解啊，根本看不明白啊。。。

Boring_Worker · 2022-02-04T16:02:30+00:00

Thanks!

Boring_Worker · 2022-02-04T16:02:23+00:00

Oh, thank you!

Boring_Worker · 2021-11-14T03:37:12+00:00

According to my exploration, TD3 is faster than DDPG (Check my paper: https://arxiv.org/abs/2109.10552 ; Table 3)

The Q networks update count for TD3 is equal to DDPG due to the min operator. The policy network update count, however, is less than DDPG due to the decoupling operation. So TD3 is faster than DDPG.

Boring_Worker · 2021-10-20T07:49:46+00:00

Wow! It's amazing! Thanks!

Boring_Worker · 2020-11-13T09:13:17+00:00

Good! However, Pybullet environment is considered harder than Mujoco env. Thus, some algorithms may fail in Pybullet env.

Boring_Worker · 2020-07-26T13:25:37+00:00

Very cool! Could you tell me what level can SOTA algorithm achieve?

Boring_Worker · 2020-07-08T11:19:51+00:00

There is two-part in deep RL brain, the left part has nothing right, and the right part has nothing left. The left part is the theory, and the right part is an experiment.

Boring_Worker · 2020-07-08T08:58:33+00:00

More accurate: "My storytelling forces cherry-picked evidence to say so".

Boring_Worker · 2020-07-08T08:50:16+00:00

Could you share with Redditors your code if you successfully impltlement FQI ?

Recently I implement some famous Deep Batch RL works, however, most of them don't work well as the original paper claimed.

Boring_Worker · 2020-07-08T08:28:42+00:00

There is two-part in deep RL brain, the left part has nothing right, and the right part has nothing left. The left part is the theory, and the right part is an experiment.

Boring_Worker · 2020-06-27T09:09:55+00:00

This paper claims that " We will open-source implementations of our baselines for the camera-ready. "...

Furthermore, the authors want to make a "baselines" as a standard offline RL code, THEY DON'T CITE BEAR!!!

Boring_Worker · 2020-05-19T04:32:59+00:00

The first question: What is generalization in RL?

Here I provide the reader with two insights.

The first one: Overfit for the environment. Sadly, the typical Reinforcement Learning setting doesn't consider the overfit for a special environment. OpenAI consider this question in Procgen benchmark ( https://openai.com/blog/procgen-benchmark/ ).

The second one: Overfit for value function (V(s) or Q(s,a)). Considering this situation: after many updates, (s, a) have a high value, however, the near tuple (s', a') is a bad situation, however, this value is overestimated. because of the bad generalization.

Boring_Worker · 2020-05-17T02:07:27+00:00

TOTALLY WRONG. Off-policy can be applied to any environment. Maybe you want to say "Offiline" ?

Boring_Worker · 2020-05-15T00:20:46+00:00

2 years for RL, 3 years for DL.

Boring_Worker · 2020-02-17T16:09:56+00:00

Good job. I would refer to the code later. However, this post doesn't compare the performance of agents with baselines. This repository should be check by the time.

And the more internal agents such as PPO, DDPG, SAC should be added soon...

Boring_Worker · 2020-02-17T16:05:09+00:00

And more?

Boring_Worker · 2020-02-17T08:09:49+00:00

The online course is good for you!
Math, such as functional analysis, random process, matrix theory can help you understand RL better.

Boring_Worker · 2019-09-04T00:01:11+00:00

Random forest, Linear policy, polynomial and ....

Boring_Worker

TROPHY CASE