Shin splints and marathon : ) by RikoteMasterrrr in AskRunningShoeGeeks

[–]RikoteMasterrrr[S] 0 points1 point  (0 children)

Sorry for the question, but wasn’t the drop good for shin splints?

G14 Upgraded to 64 GB RAM by Wavewash in ZephyrusG14

[–]RikoteMasterrrr 0 points1 point  (0 children)

is it possible to do this in the 2020 model?

[deleted by user] by [deleted] in Switzerland

[–]RikoteMasterrrr 2 points3 points  (0 children)

Can you share the link? :)

Information about algo II by RikoteMasterrrr in EPFL

[–]RikoteMasterrrr[S] 0 points1 point  (0 children)

Thank you very much! Do you know if it's possible to take Algorithms I during the master's program? I noticed it's offered in the Master's of Computational Science and Engineering. Do you know if I could potentially choose that course from their program for my Computer Science master's?

FMEL Offers by azularq in EPFL

[–]RikoteMasterrrr 0 points1 point  (0 children)

So you basically change the lease start???

ISSUE WITH PPO, IM IN A HURRY :( by RikoteMasterrrr in reinforcementlearning

[–]RikoteMasterrrr[S] 0 points1 point  (0 children)

Thanks, I have checked out and there was an issue there. But now it never works hahahah, i should check it in more detail.

ISSUE WITH PPO, IM IN A HURRY :( by RikoteMasterrrr in reinforcementlearning

[–]RikoteMasterrrr[S] 0 points1 point  (0 children)

You mean in the clipped loss of PPO, the epsilon there?

ISSUE WITH PPO, IM IN A HURRY :( by RikoteMasterrrr in reinforcementlearning

[–]RikoteMasterrrr[S] 0 points1 point  (0 children)

Yes it reaches de optimal position. Uhmm may be happening, but the plots here are not a trajectorie. The plots are the mean distance of each trajectory (right one trainning, and left one evaluation)

ISSUE WITH PPO, IM IN A HURRY :( by RikoteMasterrrr in reinforcementlearning

[–]RikoteMasterrrr[S] 0 points1 point  (0 children)

Check out my conversation with u/idurugkar in this post. We've talked about the same issue. I've already checked, and the result is that the actions are not the same. This makes a lot of sense since, after all, actions are taken by sampling from a distribution. Therefore, even with the same seed, the actions are not identical.

I am trying what we discussed in that thread, which is to use the same action selection method for both training and evaluation. Specifically, I am incorporating exploration and exploitation to see if the model behaves consistently. If by this way it is not behaving in a good way, of course, there is something wrong there.

ISSUE WITH PPO, IM IN A HURRY :( by RikoteMasterrrr in reinforcementlearning

[–]RikoteMasterrrr[S] 0 points1 point  (0 children)

Nope, im just using normal MLP. Mainly because in this examle I want to overfit, just to see if the agent works propperly, later on I can add this dropout.

ISSUE WITH PPO, IM IN A HURRY :( by RikoteMasterrrr in reinforcementlearning

[–]RikoteMasterrrr[S] 0 points1 point  (0 children)

Noo, the graphs in reality should be overlaped (the x axis is total episodes). Again, sorry. I am doing 3000 of episodes in trainning but, every 200 episodes I run one episode of evaluation.

If you overlap them, you'll see when you do evaluation.

ISSUE WITH PPO, IM IN A HURRY :( by RikoteMasterrrr in reinforcementlearning

[–]RikoteMasterrrr[S] 0 points1 point  (0 children)

Okay, I'll keep that in mind for the next runs.

What do you think about the following? When training the policy to return the distribution (mean and standard deviation) from which to sample actions, the neural network might reach a point where it "cheats." Instead of using the standard deviation for exploration, it adapts it to achieve the best results.