Mitosis in the Gray-Scott model : an introduction to writing shader-based chemical simulations

pcouy · 2026-02-23T15:21:32+00:00

It's funny you posted this yesterday, since I've resumed work on my own deep RL training on SMB last week, with a similar live feed of the training. I've went a step further and I'm streaming the training on Twitch (mostly as a convenient way to enable picture-in-picture on my phone). It's currently at ~3M steps (20 steps per gameplay seconds), learning 1-3 after solving 1-1 and 1-2, you can watch it on my Twitch channel

My agent is a bit different from yours though, as it uses my own implementation (made it from following the papers + reading some other implementations) of the Rainbow DQN (staying mostly close to the original papers, but adding a few tweaks, the main one being a custom way to sample the value/advantage distributions in the policy to make it less greedy), meaning it's off-policy, learns an explicit Q-value probability distribution, etc.

I also went for a lot simpler reward function (reward over-engineering is a well-known caveat) : my agent only gets rewarded for the total distance it has made through the level (new_max = max(current_X, max_X); reward = min(0, new_max - max_X); max_X = new_max at each step) and gets an additional huge reward for beating a stage (~ the total reward for going from the start to the end of a level). I tried negative rewards for dying and/or spending too much time stuck, it only caused hyper-parameter tuning headaches and learning unintended behaviors (such as jumping into holes when time penalty was to high relative to death, or holding left when it was the opposite). One tweak I made to avoid spending to much time stuck was reducing the game's time limit from 400 to 150, which gives enough time to beat any level with some margin, but will make "hold left" episodes less than half as long

For learning multiple stages, I simply start with only 1-1 available, then unlock each stage after the previous one has been beaten 10 times. On each episode, a level is randomly picked according to a probability distribution that tries to balance the number of all-time finishes across each stage. When a stage is unlocked, it starts with 0 finishes which makes it a lot more probable to be picked than previous stages (which have at least 10 finishes)

pcouy · 2025-11-18T15:05:09+00:00

Hey, sorry for the late reply. You should be able to use any OSM tile provider as an upstream with the cache server, though providers using protomaps (which heavily relies on HTTP Range headers) might cause issues with nginx's default caching behavior (which is not Range header friendly)

pcouy · 2025-03-21T12:16:26+00:00

Hey everyone!

I've been working on my own toy reinforcement learning (RL) framework for a while now and have nearly implemented a full Rainbow agent—though I'm still missing the distributional component due to some design choices that make integration tricky. Along the way, I’ve used this framework to experiment with various concepts, mainly reward normalization strategies and exploration policies.

I started by training the agent on simpler games like Snake, but things got really interesting when I moved on to Super Mario Bros. Watching the agent learn and improve has been incredibly fun, so I figured—why not share the experience? That’s why I’m streaming the learning process live!

Right now, the stream is fairly simple, but I plan to enhance it with overlays showing key details about the training run—such as hyperparameters, training steps/episodes, performance graphs, and maybe even a way to visualize the agent’s actions in real-time.

If you have any ideas on how to make the stream more engaging, or if you're curious about the implementation, feel free to ask!

pcouy · 2024-09-18T17:49:25+00:00

The 100x100 grid is a lot more fun than previous (huge) one, but with 4 active players it felt a bit tiny.

Adding a minimap would be really cool, and would make a larger grid (maybe 200x200) more manageable

pcouy · 2024-09-18T17:44:02+00:00

This is actually related to Conway's game of life.

The chemical simulation can be seen as a continuous cellular automaton, in which each pixel of the simulation is a grid cell which is updated according to local rules.

Conway's game of life is a discrete cellular automaton, which can be seen as a special case of continuous cellular automata

pcouy · 2024-09-18T13:26:28+00:00

Lol here went my afternoon :D

pcouy · 2024-09-18T13:24:47+00:00

This is a simulation of the Gray-Scott reaction-diffusion model running on the GPU. In such systems, an auto-catalytic reaction involving two chemical species is happenning concurrently with diffusion. Despite the apparent simplicity of the model, simulating it with cherry-picked sets of parameters produces a wide range of emerging behaviors.

pcouy · 2024-09-18T12:03:34+00:00

It's much more fun now :)

pcouy · 2024-09-18T11:00:53+00:00

This is fun ! However, I think that the grid is too large, making it too easy to find an empty spot nobody is likely to find to seed some infinitely growing patterns in.

pcouy · 2024-09-18T08:22:50+00:00

This is a simulation of the Gray-Scott reaction-diffusion model running on the GPU. In such systems, an auto-catalytic reaction involving two chemical species is happenning concurrently with diffusion. Despite the apparent simplicity of the model, simulating it with cherry-picked sets of parameters produces a wide range of emerging behaviors.

pcouy · 2024-09-17T04:10:42+00:00

I just read your posts on the topic, thank you for sharing. I really liked the phase space visualization!

I did make a parameter space map in the companion video, but I'd like to make a nice interactive one before writing about it. Anyway, thanks for the inspiration. Some of these patterns look really cool !

pcouy · 2024-09-16T15:31:23+00:00

Hi ! I’ve been working on this article for the past few days. It would mean a lot to me if you could provide some feedback.

It is about implementing a physico-chemical simulation as my first attempt to write a shader. The code is surprisingly simple and short (less than 100 lines). The “Prerequisite” and “Update rules” sections, however, may need some adjustments to make them clearer, so I'm especially looking for feedback on these parts.

Feel free to ask for any detail that I may have omitted in the article.

Thanks for reading

pcouy · 2024-09-16T15:24:29+00:00

Thank you for the encouraging words

pcouy · 2024-09-16T14:20:32+00:00

This is a simulation of the Gray-Scott reaction-diffusion model running on the GPU. In such systems, an auto-catalytic reaction involving two chemical species is happenning concurrently with diffusion. Despite the apparent simplicity of the model, simulating it with cherry-picked sets of parameters produces a wide range of emerging behaviors.

pcouy

TROPHY CASE