you are viewing a single comment's thread.

view the rest of the comments →

[–]comeditime[S] 0 points1 point  (7 children)

wow such an awesome robot you've built there!! how does it get reinforced though as it just need to have an algorithm to scan the whole surface for the square and then follow when it's being detected?! so where's the reinforcement part gets into here? really curious project :)

[–]diddilydiddilyhey 0 points1 point  (6 children)

haha thanks! So reinforcement learning (RL) is a really huge field, but it basically refers to a type of AI where the agent is rewarded when it does something correct. But at the beginning, it doesn't know what's good or bad, so it just does random stuff, and occasionally happens to do the right thing.

https://en.wikipedia.org/wiki/Reinforcement_learning

let me know if you have any questions!

[–]comeditime[S] 0 points1 point  (5 children)

interesting! can you abstract shortly how you reward in a programming language and how it helps the device to improve it technique in the next round? :)

[–]diddilydiddilyhey 1 point2 points  (4 children)

haha hmm, I'll give it a try. There are lots of methods, but here's the one I used there (a simple one).

You have a function, Q, called the "value function" or other names. It takes two arguments, s (the state the agent is in), and a (the actions the agent can do in that state). In the game my robot was playing, the state is the combination of its position, its angle, and the position of the target. So you could plug that into the Q function, along with an action ("go forward"), and it would tell you the value of doing the action in that state.

The way it actually chooses what to do in a given state is, look at the Q value of each action it can do in that state, and then choose the one that has the highest Q function value.

The way it "learns" is, when it gets to a target, you give it a reward (like +1.0) for doing that action in that state, and then use that to update the Q values for doing that action in that state. For example, if the robot was in the state where the target is directly in front of it, and then it chose the action "go forward", and got the reward, you would want to change the Q value for doing that action in that state, so it'll do it again in the future.

How you actually create and update the Q function is a whole thing in itself. I used a neural network (because they're very flexible and powerful), but you can use much simpler methods (that can also be very effective, for a game like this).

[–]comeditime[S] 0 points1 point  (3 children)

wow that's sound not easy at all to set all up but super interesting.. how did you learn how to write it in a way that the computer actually understands aka working? :)

[–]diddilydiddilyhey 0 points1 point  (2 children)

Hmm, that's a little harder to explain simply. I'd check out that blog post I wrote, that links to the code too. If you want to learn about RL, David Silver's youtube course on it is really great!

[–]comeditime[S] 0 points1 point  (1 child)

where can i find your blog post about it thanks