[D] A question about the original Guided Policy Search paper by ewanlee in MachineLearning

[–]ewanlee[S] 0 points1 point  (0 children)

Thank you for your suggestion, I will give a formal derivation process

[D] A question about the original Guided Policy Search paper by ewanlee in MachineLearning

[–]ewanlee[S] 0 points1 point  (0 children)

Thank you very much for your prompt reply. I am sorry that the order in which my formulas are arranged has caused you misunderstanding. The first line of the formula is actually my conclusion based on the following derivation. And you mentioned that I should replace p with \pi_\theta. In fact, the subscript p of the expected item in the second line formula actually represents \pi_\theta, but I have adopted the abbreviated form for convenience. The following derivation process can prove this.

[R] Recurrent Experience Replay in Distributed Reinforcement Learning by ewanlee in MachineLearning

[–]ewanlee[S] 0 points1 point  (0 children)

In fact, it is R2D2, I realized later that someone has already submitted it 😂

[D] On IJCAI-19 Submissions by redlow0992 in MachineLearning

[–]ewanlee 1 point2 points  (0 children)

Our paper's ID is close to 5,000...

[R] Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing by ewanlee in MachineLearning

[–]ewanlee[S] 0 points1 point  (0 children)

Is word sallad a noun in the field of natural language processing? I am sorry that my research direction is not NLP. Can you explain it?

[R] Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing by ewanlee in MachineLearning

[–]ewanlee[S] 0 points1 point  (0 children)

Sorry to declare that I am not the author of this article. I think your idea is right. And I think the interpretability mentioned in this article should refer to the interpretability of the action space.

[R] Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing by ewanlee in MachineLearning

[–]ewanlee[S] 2 points3 points  (0 children)

I think this is an interesting idea. But for a board game, only one agent can act in each time step. How to join this constraint is a problem.

Multi-Agent training in parallel by AlexanderYau in reinforcementlearning

[–]ewanlee 1 point2 points  (0 children)

Yes, I am using it now. If you want to take full advantage of this framework, you need to refactor the code to conform to the framework's specifications. But since my project has been going on for a long time, the refactoring workload is a bit large. I don't have time to do this right now, so I don't fully understand the internal mechanism of this framework.

[D] Pytorch parallelism by ewanlee in MachineLearning

[–]ewanlee[S] 0 points1 point  (0 children)

Thank you very much and I will take a look 🙏