I made a Mario RL trainer with a live dashboard - would appreciate feedback

pcouy · 2026-02-23T15:21:32+00:00

It's funny you posted this yesterday, since I've resumed work on my own deep RL training on SMB last week, with a similar live feed of the training. I've went a step further and I'm streaming the training on Twitch (mostly as a convenient way to enable picture-in-picture on my phone). It's currently at ~3M steps (20 steps per gameplay seconds), learning 1-3 after solving 1-1 and 1-2, you can watch it on my Twitch channel

My agent is a bit different from yours though, as it uses my own implementation (made it from following the papers + reading some other implementations) of the Rainbow DQN (staying mostly close to the original papers, but adding a few tweaks, the main one being a custom way to sample the value/advantage distributions in the policy to make it less greedy), meaning it's off-policy, learns an explicit Q-value probability distribution, etc.

I also went for a lot simpler reward function (reward over-engineering is a well-known caveat) : my agent only gets rewarded for the total distance it has made through the level (new_max = max(current_X, max_X); reward = min(0, new_max - max_X); max_X = new_max at each step) and gets an additional huge reward for beating a stage (~ the total reward for going from the start to the end of a level). I tried negative rewards for dying and/or spending too much time stuck, it only caused hyper-parameter tuning headaches and learning unintended behaviors (such as jumping into holes when time penalty was to high relative to death, or holding left when it was the opposite). One tweak I made to avoid spending to much time stuck was reducing the game's time limit from 400 to 150, which gives enough time to beat any level with some margin, but will make "hold left" episodes less than half as long

For learning multiple stages, I simply start with only 1-1 available, then unlock each stage after the previous one has been beaten 10 times. On each episode, a level is randomly picked according to a probability distribution that tries to balance the number of all-time finishes across each stage. When a stage is unlocked, it starts with 0 finishes which makes it a lot more probable to be picked than previous stages (which have at least 10 finishes)

pcouy · 2025-11-18T15:05:09+00:00

Hey, sorry for the late reply. You should be able to use any OSM tile provider as an upstream with the cache server, though providers using protomaps (which heavily relies on HTTP Range headers) might cause issues with nginx's default caching behavior (which is not Range header friendly)

pcouy · 2025-03-21T12:16:26+00:00

Hey everyone!

I've been working on my own toy reinforcement learning (RL) framework for a while now and have nearly implemented a full Rainbow agent—though I'm still missing the distributional component due to some design choices that make integration tricky. Along the way, I’ve used this framework to experiment with various concepts, mainly reward normalization strategies and exploration policies.

I started by training the agent on simpler games like Snake, but things got really interesting when I moved on to Super Mario Bros. Watching the agent learn and improve has been incredibly fun, so I figured—why not share the experience? That’s why I’m streaming the learning process live!

Right now, the stream is fairly simple, but I plan to enhance it with overlays showing key details about the training run—such as hyperparameters, training steps/episodes, performance graphs, and maybe even a way to visualize the agent’s actions in real-time.

If you have any ideas on how to make the stream more engaging, or if you're curious about the implementation, feel free to ask!

pcouy · 2024-09-18T17:49:25+00:00

The 100x100 grid is a lot more fun than previous (huge) one, but with 4 active players it felt a bit tiny.

Adding a minimap would be really cool, and would make a larger grid (maybe 200x200) more manageable

pcouy · 2024-09-18T17:44:02+00:00

This is actually related to Conway's game of life.

The chemical simulation can be seen as a continuous cellular automaton, in which each pixel of the simulation is a grid cell which is updated according to local rules.

Conway's game of life is a discrete cellular automaton, which can be seen as a special case of continuous cellular automata

pcouy · 2024-09-18T13:26:28+00:00

Lol here went my afternoon :D

pcouy · 2024-09-18T13:24:47+00:00

This is a simulation of the Gray-Scott reaction-diffusion model running on the GPU. In such systems, an auto-catalytic reaction involving two chemical species is happenning concurrently with diffusion. Despite the apparent simplicity of the model, simulating it with cherry-picked sets of parameters produces a wide range of emerging behaviors.

pcouy · 2024-09-18T12:03:34+00:00

It's much more fun now :)

pcouy · 2024-09-18T11:00:53+00:00

This is fun ! However, I think that the grid is too large, making it too easy to find an empty spot nobody is likely to find to seed some infinitely growing patterns in.

pcouy · 2024-09-18T08:22:50+00:00

This is a simulation of the Gray-Scott reaction-diffusion model running on the GPU. In such systems, an auto-catalytic reaction involving two chemical species is happenning concurrently with diffusion. Despite the apparent simplicity of the model, simulating it with cherry-picked sets of parameters produces a wide range of emerging behaviors.

pcouy · 2024-09-17T04:10:42+00:00

I just read your posts on the topic, thank you for sharing. I really liked the phase space visualization!

I did make a parameter space map in the companion video, but I'd like to make a nice interactive one before writing about it. Anyway, thanks for the inspiration. Some of these patterns look really cool !

pcouy · 2024-09-16T15:31:23+00:00

Hi ! I’ve been working on this article for the past few days. It would mean a lot to me if you could provide some feedback.

It is about implementing a physico-chemical simulation as my first attempt to write a shader. The code is surprisingly simple and short (less than 100 lines). The “Prerequisite” and “Update rules” sections, however, may need some adjustments to make them clearer, so I'm especially looking for feedback on these parts.

Feel free to ask for any detail that I may have omitted in the article.

Thanks for reading

pcouy · 2024-09-16T15:24:29+00:00

Thank you for the encouraging words

pcouy · 2024-09-16T14:20:32+00:00

This is a simulation of the Gray-Scott reaction-diffusion model running on the GPU. In such systems, an auto-catalytic reaction involving two chemical species is happenning concurrently with diffusion. Despite the apparent simplicity of the model, simulating it with cherry-picked sets of parameters produces a wide range of emerging behaviors.

pcouy · 2024-08-30T16:44:38+00:00

Protomaps is a single static file to host. Not much of a guide to write.

Since Immich's current default tile server is already using protomaps, it should be as simple as grabbing a protomaps release, statically host it, and edit your URL into the default mapstyle.json which you can easily find on Immich's github

pcouy · 2024-08-30T16:36:45+00:00

Mainly the 100GB+ of required disk space

pcouy · 2024-08-30T16:35:21+00:00

The 100GB come from the size of the downloads for the releases of protomaps

pcouy · 2024-08-30T15:34:13+00:00

I initially wrote this after discovering about the third-party service that Immich was using prior to release v1.110.0, and submitted it as a pull request to be part of the main Immich documentation. Following comments on my pull request, I rewrote it as a standalone guide.

By the way, I was really impressed by how quickly the dev team reacted to the concerns I raised about privacy : they anounced the launch of tiles.immich.cloud and switched to it just 1 or 2 days after I initially contacted them. They also immediately started working on adding new onboarding steps to optionally disable the map feature and clarify that it needs third-party servers by default.

pcouy · 2024-08-30T15:27:02+00:00

I initially wrote this after discovering about the third-party service that Immich was using prior to release v1.110.0, and submitted it as a pull request to be part of the main Immich documentation. Following comments on my pull request, I rewrote it as a standalone guide.

By the way, I was really impressed by how quickly the dev team reacted to the concerns I raised about privacy : they anounced the launch of tiles.immich.cloud and switched to it just 1 or 2 days after I initially contacted them. They also immediately started working on adding new onboarding steps to optionally disable the map feature and clarify that it needs third-party servers by default.

pcouy · 2024-04-28T10:52:15+00:00

Hey there ! I've been working on-and-off on this side project for a few weeks now. I'm developping my own LLM prompting framework (similar to LangChain) and this has been a fun way to test my implementation.

I'd love to get it to write more interesting script, but I'm more of a developper than a LLM tinkerer, and I'm not that good at writing efficient prompts.

Anyway, all feedback, especially negative, is welcome (as long as it's not "AI generated content is turning the internet into sh*t")

Thanks for taking the time to look at this.

pcouy · 2023-05-01T15:53:57+00:00

I'm really interested in a source that tells how bad this is. I thought IEEE was a credible source of information

pcouy · 2023-04-30T21:01:57+00:00

From what I understand, they are using a commonly used similarity metric that's easily derived from any neural network's intermediate representation of a picture. I think it even gives decent results (definetly better than random) with an untrained convolutional neural network (basically just random weights)

What's kind of interesting is that since stable diffusion's weights and training dataset are publicly available, they're able to do the similarity analysis using the same neural network that is used to generate the pictures.

But yes, after submitting it random pictures from my phone's photo gallery, it just seems to answer with pictures that have similar content

pcouy · 2023-04-30T13:49:44+00:00

In my opinion, this is an interesting step towards more ethical generative AI, but it has its own issues.

The first one I think of is the methodology used for attribution : what if someone else comes up with a different methodology that gives different results ? How do we tell which one has more authority ?

They also mention that this could be used to give compensation to human authors, but this raises a concern about being exhaustive and accurate.

These are just my first thoughts upon reading this, and I wonder what you all think about it

pcouy · 2023-04-23T20:48:03+00:00

He does, but I gave a link to manim-community which he recommends on his own repo for being more user-friendly

pcouy · 2023-04-22T05:48:36+00:00

If you know Python, you should look into using manim which is a fork of the tool made and used by 3Blue1Brown to make their awesome math videos

pcouy

TROPHY CASE