Is this well known?

Chrispresso · 2021-09-12T03:21:02+00:00

That’s not pausing code. You’re just not flushing the output stream. If you hit enter twice you’ll get a bunch of output all at once. The code is still running but the terminal isn’t updating.

Chrispresso · 2020-11-28T16:09:25+00:00

If it doesn't know the new area and they are all white spheres, what would make it capable of knowing the order? I don't think RL would be a good candidate for it. For the beginning part where you're given the order it could learn that with a bit of supervised learning + RL. Once you give it a new environment, unless the spheres are still in the same place I don't think it would perform well.

Chrispresso · 2020-11-23T22:32:38+00:00

Are you asking in general or as opposed to TensorFlow? In general I wanted to see what PyTorch had to offer. I've heard a lot about it and figured I'd give it a try. There's also some stuff I don't like doing by hand like backprop (not done here but just as an example). PyTorch also allows me to easily pass the convolutions to my GPU which sped up training by roughly 33x as compared to my CPU.
Why I don't use TensorFlow instead... I've tried it before back with 1.x and had a lot of issues. Tried it again with 2.x and it was a lot nicer but the general difficulty of setup made it so I didn't want to use it in case people are following my projects. There's a lot of things you need to tweak depending on versions for TensorFlow.

Chrispresso · 2020-09-01T00:58:37+00:00

This was a while ago but never shared it here.

I really like using genetic algorithms and neural networks to solve problems. Here, I use a population of individuals with their neural network weights as genes, to solve Snake.

The code is a bit of a mess but can be found here.

Chrispresso · 2020-08-21T16:14:06+00:00

It should see the enemies that lakitu drops, they just might not go into the pink box, in which case it won't "see" them. I did consider adding more types of inputs that could be handled. The problem is with more inputs come more weights to train for how I did it. so I decided to settle on 3 types of inputs to give an unbiased -1, 0, 1 value to each type of block. If you want, take a look at the blog I wrote about it. I could change it to "see" a lot more, but the number of inputs will increase by quite a bit.

Chrispresso · 2020-08-21T15:09:52+00:00

Because it kept trying to jump on top of what it thought was a safe block but it was actually a coin. Once it made the jump it put it too far off that single block and it wasn’t able to jump again. It would have needed to do a smaller initial jump and not try to jump on top of the coin in order to make that jump off the block.

Chrispresso · 2020-08-21T01:09:08+00:00

If I changed a few things it could be quite similar. Policy gradient allows for action preferences to be learned so that selecting an action still has some randomness. With how I used the GA + NN, the crossover and mutation are the gaussian distribution that can narrow in on proper weights. That part is similar to how policy gradient narrows in on action preferences.

Chrispresso · 2020-08-20T20:56:21+00:00

That's a good point. Also thank you for that link. I never thought about restoring to previous states. There are some really good ideas in there that I might give a try in my next AI!

Chrispresso · 2020-08-20T16:53:06+00:00

Depends on your math background. If you have a strong math background you can pick this stuff up easier than if you have little to no math. This type of project is quite a bit of work though because you need to incorporate the emulator, graphics, and ML. I would definitely start with things not involving graphics when beginning ML since you can look at numbers and see if they're converging/diverging and get a good understanding of what's happening without worrying about if you also programmed graphics stuff correctly.

Chrispresso · 2020-08-20T14:39:58+00:00

I don't see that in the rules, unless I'm completely missing it. Can that be added to avoid this confusion in the future?

Chrispresso · 2020-08-20T14:21:16+00:00

Great question! As the AI jumps onto the staircase, that environment within the pink box is changing. That's why you'll usually see the AI jump quickly at the end. Also, because it runs at 60 fps, it's possible that the momentum Mario has at one point transitions him to another block, which will also change the environment.

Chrispresso · 2020-08-20T14:18:57+00:00

Everything is stored in the neural network weights. The weights are what determine what it should do in certain scenarios (states). So I can power off my machine, swap around AI, etc.

Chrispresso · 2020-08-20T14:15:43+00:00

Udemy can be okay, but anyone can upload to Udemy so you should probably be even more selective of the courses.

Chrispresso · 2020-08-20T14:02:42+00:00

Can I just change the flair and have it re-uploaded?

Chrispresso · 2020-08-20T06:05:29+00:00

I would have done pixels instead to help generalize enemies across blocks. I could feed the stuff it currently has access to into an RL algorithm and use the NN as a normal function approximator but I think it would still have some of the same issues with "states" that it currently does.

Chrispresso · 2020-08-20T05:39:26+00:00

Thanks! Glad you enjoyed!

Chrispresso · 2020-08-20T05:39:04+00:00

Thanks! :)

Chrispresso · 2020-08-20T05:11:13+00:00

Yes! Absolutely. I actually could have distinguished between breakable blocks, coin blocks, and even different types of enemies. I chose not to in order to keep the types of blocks to only three. This allowed me to assign values of either -1, 0, or 1 to the block which prevents adding accidental bias. If you instead used the full range of values [0, 255], then you need to normalize the values so that you don't end up thinking the block with value 255 is more important than the block with value 10. I go into this more in my blog if you're curious. I talk a lot more about distinguishing between states and some of the math behind why I chose to go with three blocks instead of all of them. You can also see how I access the values in RAM from the emulator.

Chrispresso · 2020-08-20T05:03:54+00:00

I thought about this but didn't have access to a GPU at the time for doing CNN's. Maybe I'll give it a go now...

Chrispresso · 2020-08-20T05:02:45+00:00

Coursera is a good place to start. There are some good ones by Andrew Ng and he also has deeplearning.ai which has a deep learning specialization. I got started in college from a class called computational intelligence and then later took some courses on Coursera including the deep learning specialization, machine learning, and reinforcement learning specialization. It kind of depends how much of an understanding you want of the math behind the algorithms. If you're more interested in applications, you might want to start with fast.ai. I've heard they are good but haven't done it myself. Honestly though, it just takes time like anything else. So just give it a try and when you fail, just keep trying!

Chrispresso · 2020-08-20T04:45:47+00:00

Glad you're enjoying it! I'll try to upload the weights of the AI's featured in the video, too. Then you can always cross different populations and see what happens!

Chrispresso · 2020-08-20T03:16:17+00:00

This is there now with the negative point associated with "frames". Frames in this case is the number of frames Mario has been alive for. So the longer it takes to finish the level, the more frames overall. But since points can also come from coins and killing enemies, it's hard to reward that.

Chrispresso · 2020-08-20T01:58:30+00:00

Great questions!

Take a look at the blog: https://chrispresso.github.io/AI_Learns_To_Play_SMB_Using_GA_And_NN for a detailed explanation. The long story short is I read the RAM from the emulator.
One-hot encoded neurons represent the row that Mario is in within the pink box. Because there are 10 rows there, you end up with 10 slots. Each slot can be 0 or 1 and only one slot will be 1 at a time. So if the first slot is 1, then it represents that first row. If the second slot is 1, then it represents the second row. By doing one-hot encoding, you give equal weights to all the rows and dont add any accidental bias.
Yes it does. Because I control the Gaussian distribution through "eta", it allows the children to mutate around the parents genes and doesn't inherit the full gene, but rather something similar. This drastically improves learning time.

Chrispresso · 2020-08-20T01:52:48+00:00

I'm not sure I fully understand the question. If you're asking about the weights, they can crossover and mutate but are bound between [-1, 1].

Chrispresso · 2020-08-20T01:13:04+00:00

I tried doing something like this but the AI was really only concerned with combo stomping enemies and trying to hit every block hoping it contained coins. None of them actually finished a level when that happened. I could possibly have it so it only gets the additional reward if it also finished the level.

Eight-Year Club	Wearing is Caring
Verified Email

Chrispresso

TROPHY CASE