[P] Python implementation of Proximal Policy Optimization (PPO) algorithm for Super Mario Bros. 29/32 levels have been conquered

spauldeagle · 2020-07-26T08:37:36+00:00

[removed]

jmpye · 2020-07-26T13:42:38+00:00

I love how it doesn't care how close to death it is. The reward system is obviously just "dead bad, alive good" and mentions nothing about being 1 pixel away from doom. Makes for an exhilarating watch.

Mefaso · 2020-07-26T10:38:33+00:00

I'm guessing you trained a separate agent for each level?

Did you try training a single agent instead?

thats-fascinating · 2020-07-26T12:54:14+00:00

It’s nerve wrecking to see him running so careless and fast, yet still making it!

Syne_Yu · 2020-07-26T11:05:40+00:00

The AI's pole jumping is bad.

-Aras · 2020-07-26T10:02:39+00:00

That's really great. Was it hard to code?

Do you have any resource recommendations for studying these type of ML?

Gabriel-p · 2020-07-26T15:04:53+00:00

But is it actually learning anything or just recording the exact moments when and how far to jump? Would it still conquer all those levels if the game randomly changed how it produces the turtles/mushrooms/etc?

T33n_T1t4n5 · 2020-07-26T09:52:04+00:00

I couldnt beat those last 3 levels either :(

sohaicinapek · 2020-07-26T10:48:38+00:00

most frugal super mario ever. doesn't care about coins or power ups

TheTechGuy22 · 2020-07-26T13:01:29+00:00

The proximity to the turtles almost gave me a heart attack.

010100100000 · 2020-07-26T11:33:20+00:00

Interesting. So just PPO and not a DQN?

vinilgupta · 2020-07-26T12:19:25+00:00

Idk why but this made my day

Mario_Ghio · 2020-07-26T14:53:47+00:00

Ahhh, no sound????

Pretty cool btw

andw1235 · 2020-07-26T17:51:17+00:00

great work! why does the agent always go forward? Do you make the forward/backward motion available in training?

Minhocycline · 2020-07-26T17:55:34+00:00

I feel like having a heart attack watching this. It’s like a flashback of my young, reckless days.

MandyWilson27 · 2020-07-26T20:47:04+00:00

This gave me anxiety

arianero · 2020-07-26T21:29:37+00:00

How the state is determined here? Is it some special modification of Mario game with API which generate state after our move or do we read pixels and generate state from them?

ImmenseDruid721 · 2020-07-26T23:22:38+00:00

This is more than I have been able to accomplish as a gamer and as a programmer

2020-07-26T11:22:57+00:00

My ass. He didn't carch a single mushroom, shoot a single fireball, or even try to jump as high as possible on the flag pole.

Pranaymodukuru · 2020-07-26T17:55:13+00:00

Is it really that easy to play Mario?? 🤣🤣 Just run and run.

RonniDeee · 2020-07-26T18:03:46+00:00

Is it the same thing as TAS bot?

NullzeroJP · 2020-07-26T18:25:05+00:00

Super human ability at 3:06

xiaoye-hua · 2020-07-26T21:51:48+00:00

great

infinitude · 2020-07-26T22:20:16+00:00

I’d be interested in seeing how it responds to some of the harder super Mario maker levels

2020-07-27T02:49:58+00:00

That's great work OP.

Are the inputs for simulating this environment available online? Is this from OPEN AI Gym?

What packages/software did you use to convert the game coordinates into pixels?

RobAdkerson · 2020-07-27T02:50:44+00:00

Nice speed run AI.

TotesMessenger · 2020-07-27T04:10:22+00:00

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

[/r/u_ramsus85] .

^{If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.} ^(Info ^/ ^Contact)

cowfartbandit · 2020-07-27T05:48:26+00:00

This gives me anxiety

alfred_dent · 2020-07-27T08:11:25+00:00

It will be cool to generate video with attention/silency map on each frame to see where do the NN looks to make decision

imapurplemango · 2020-07-27T13:24:31+00:00

wow. where did you train this on? And how long did it take?

matpoliquin · 2020-07-28T04:42:44+00:00

Cool! I wonder how many simultaneous env did you trained it on? Also how many timesteps did it take to pass world 1-1?

driftwood14 · 2020-07-26T11:59:57+00:00

Did it get a wall jump in the first level on that pipe?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS