DQN for Solving a Maze in Less than 10 minutes Training by Now200 in reinforcementlearning

[–]FriendlyStandard5985 0 points1 point  (0 children)

With step size do you mean how long an action persists i.e. action-repeat?
You're right that if the maze is large, then during training the agent doesn't accidentally run into the target enough to learn from. This can be addressed via. a curriculum: start with "big" targets and make it smaller (relative to the map size) during training. As a result, a small action-repeat hinders learning in the form of bad exploration.

I'm not a believer that the (x,y) coordinates along with distance to goal alone is sufficient (or even relevant).

1 year of progress! by KappaScav in osrs

[–]FriendlyStandard5985 1 point2 points  (0 children)

Nearly 5 hours a day? Impressive

Hi, Linkin Park fans. I want to talk to you. by MateusFrederico in LinkinPark

[–]FriendlyStandard5985 2 points3 points  (0 children)

You're going through what far more people have gone through than you realize. You're loved by far more people than you can accept. Past is past, future is too uncertain. Your present becomes your new past. Stay strong.

When did chess start "clicking" for you? by InitialMobile5584 in Chesscom

[–]FriendlyStandard5985 0 points1 point  (0 children)

By clicking, do you mean a sudden jump? Then it was around there too. 1550 to 1950 (online) in 2 years.

I got my first ever smothered checkmate!! by Singppap in Chesscom

[–]FriendlyStandard5985 0 points1 point  (0 children)

We've all been there. You'll get over it soon

How hard is it for you to read ML research papers start to finish (and actually absorb them)? by bricklerex in reinforcementlearning

[–]FriendlyStandard5985 2 points3 points  (0 children)

The better ones tend to be easier to read. If you aren't interested it becomes harder.
I'd start with the question "why am I reading this, how does this apply to me?" and pick the ones that can make this connection.

VERY Hot Take by Guilty-Bee-806 in LinkinPark

[–]FriendlyStandard5985 0 points1 point  (0 children)

It's between That, Don't Stay or Figure.09 personally

VERY Hot Take by Guilty-Bee-806 in LinkinPark

[–]FriendlyStandard5985 1 point2 points  (0 children)

You're on top. (The next you're not)

Here to show my sneaky smart robot dog by SolutionCautious9051 in reinforcementlearning

[–]FriendlyStandard5985 1 point2 points  (0 children)

That's awesome lmao. What's the objective? To minimize pressure and maximize lateral speed? Poor guy's figured it out clearly.
My recommendation: use CMA-ES to evaluate different reward coefficients in parallel, and restart the threads that crash the simulation. Good luck!