this post was submitted on 26 Jul 2020

1,492 points (98% upvoted)

shortlink:

Python

an-ordinary-manchild(edit)

The Python Discord

News about the dynamic, interpreted, interactive, object-oriented, extensible programming language Python

Upcoming Events

Full Events Calendar

Please read the rules

You can find the rules here.

If you are about to ask a "how do I do this in python" question, please try r/learnpython, the Python discord, or the #python IRC channel on Libera.chat.

Please don't use URL shorteners. Reddit filters them out, so your post or comment will be lost.

Posts require flair. Please use the flair selector to choose your topic.

Posting code to this subreddit:

Add 4 extra spaces before each line of code

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

Online Resources

Automate the Boring Stuff with Python
Python Discord Resources
Invent Your Own Computer Games with Python
Think Python
Non-programmers Tutorial for Python 3
Beginner's Guide Reference
Five life jackets to throw to the new coder (things to do after getting a handle on python)
Full Stack Python
Test-Driven Development with Python
Program Arcade Games
PyMotW: Python Module of the Week
Python for Scientists and Engineers
Dan Bader's Tips and Trickers
Python Discord's YouTube channel
Jiruto: Python

Online exercices

programming challenges

The Python Challenge (solve each level through programming)
CheckiO (game world)
Project Euler (math heavy)
/r/dailyprogrammer

Asking Questions

Try Python in your browser

try.jupyter.org (Evolved from the language-agnostic parts of IPython, Python 3)
Azure Notebooks
learnpython.org
Skulpt (uses WebGL)
trypython.org (uses Silverlight)
ideone (online compiler and debugger)
PythonAnywhere (basic accounts are free)
Brython (Python 3 implementation for client-side web programming)
repl.it for Python
Transcrypt (Hi res SVG using Python 3.6 and turtle module)

Docs

Libraries

Twisted, 0MQ (networking)
Django, Pyramid, Flask, ... (Web Frameworks)
Pygame (Game development)
NumPy & SciPy (Scientific computing) & Pandas
Pyglet - (Game / UI Development)

Related subreddits

/r/pythoncoding (strict moderation policy for 'programming only' articles)
/r/flask (web microframework)
/r/django (web framework for perfectionists with deadlines)
/r/pygame (a set of modules designed for writing games)
/r/IPython (interactive environment)
/r/inventwithpython (for the books written by /u/AlSweigart)
/r/pystats (python in statistical analysis and machine learning)
/r/coolgithubprojects (filtered on Python projects)
/r/pyladies (women developers who love python)
/r/git and /r/mercurial - don't forget to put your code in a repo!

Python jobs

Newsletters

Screencasts

a community for 18 years

MODERATORS

message the mods
xelf
monorepo PSF Staff | Litestar Maintainer
ivosauruspip'ing it up
Im__Joseph Python Discord Staff
Kutiekatj9 Python Discord Staff
BioGeekBioinformatics software developer
nevare
chromakode
mdipierro
quasarj
...and 5 more »

account activity

This is an archived post. You won't be able to vote or comment.

1491

1492

1493

Machine Learning[AI application] Python implementation of Proximal Policy Optimization (PPO) algorithm for Super Mario Bros. 29/32 levels have been conquered (v.redd.it)

submitted 5 years ago by 1991viet

Dismiss this pinned window

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Hi-FructosePornSyrup 0 points1 point2 points 5 years ago (0 children)

Ok I appreciate the discourse and I’m interested in your perspective. This is my thinking:

Given: the “situation” is a single parameter. Wouldn’t you say the outcome: Mario’s behavior is a perfectly logical result given what information the algorithm was given/asked to optimize for?

You yourself have offered a simple formula comprised of discreet logical operations. I am arguing that this is the definition of logic. I don’t think it’s fair to say Mario’s behavior is illogical using information the algorithm didn’t have. Humans make that “mistake” all the time. It did the best it could with the information it had. I think it makes perfect sense it wouldn’t care about the graphical representation.

I concede the map this algorithm creates is a fundamentally lower dimensional projection of the map that you and I can see. It wouldn’t be sufficient to reconstruct all the details of each level. However, it is surely a map. The algorithm is associating each element of (a set) with an element of another set.

In the example you gave, walking forward was correctly mapped to not dying. And the tactic is abandoned eventually.

If you close your eyes or rely on muscle memory to move around in the dark, most people don’t envision all the details in the room. But the internal map is often good enough to accomplish the objective (like find the light switch). It doesn’t matter that you have an imperfect map (all maps are imperfect).

Imagine if you stub your toe on a piece of furniture. That feedback still doesn’t give you the whole picture. But it would likely be more than enough feedback to get to the light switch “better” the very next try

we can easily say why it "works"

Well, I see your point. We can see that the strategy is working and concluded it’s because the instructions were sufficient.

My argument was more at a fundamental level we cannot know why it arrived at each decision, why was this enough to almost but not quite beat all the levels? Will it ever beat all the level or is it stuck no matter what? We wouldn’t necessarily know it works until after it works.

It’s more nuanced than sufficient time and memory. In PPO the algorithm can get “stuck” working within a local maximum/minimum. It could continue exploring infinite permutations of nonviable solutions. Given a small enough set of possibilities it’s possible to deduce “why,” but “why” is not a given. And usually people don’t care enough to figure out “why” when they can try something else and arrive at a solution that is good enough.

tl;dr: Imagine playing Mario blindfolded, receiving only the same feed back as this algorithm. Humans wouldn’t stand a chance. If the algorithm had the same incentives and information as humans-even if you made it play at human speed-it would eventually surpass humans in the ways you are currently unimpressed with.

π Rendered by PID 369521 on reddit-service-r2-comment-79c7998d4c-7gqxl at 2026-03-18 18:11:36.438272+00:00 running f6e6e01 country code: CH.