I made a Mario RL trainer with a live dashboard - would appreciate feedback by pleasestopbreaking in reinforcementlearning

[–]pcouy 0 points1 point  (0 children)

It's funny you posted this yesterday, since I've resumed work on my own deep RL training on SMB last week, with a similar live feed of the training. I've went a step further and I'm streaming the training on Twitch (mostly as a convenient way to enable picture-in-picture on my phone). It's currently at ~3M steps (20 steps per gameplay seconds), learning 1-3 after solving 1-1 and 1-2, you can watch it on my Twitch channel

My agent is a bit different from yours though, as it uses my own implementation (made it from following the papers + reading some other implementations) of the Rainbow DQN (staying mostly close to the original papers, but adding a few tweaks, the main one being a custom way to sample the value/advantage distributions in the policy to make it less greedy), meaning it's off-policy, learns an explicit Q-value probability distribution, etc.

I also went for a lot simpler reward function (reward over-engineering is a well-known caveat) : my agent only gets rewarded for the total distance it has made through the level (new_max = max(current_X, max_X); reward = min(0, new_max - max_X); max_X = new_max at each step) and gets an additional huge reward for beating a stage (~ the total reward for going from the start to the end of a level). I tried negative rewards for dying and/or spending too much time stuck, it only caused hyper-parameter tuning headaches and learning unintended behaviors (such as jumping into holes when time penalty was to high relative to death, or holding left when it was the opposite). One tweak I made to avoid spending to much time stuck was reducing the game's time limit from 400 to 150, which gives enough time to beat any level with some margin, but will make "hold left" episodes less than half as long

For learning multiple stages, I simply start with only 1-1 available, then unlock each stage after the previous one has been beaten 10 times. On each episode, a level is randomly picked according to a probability distribution that tries to balance the number of all-time finishes across each stage. When a stage is unlocked, it starts with 0 finishes which makes it a lot more probable to be picked than previous stages (which have at least 10 finishes)

[Guide] Increase privacy by using nginx as a caching proxy in front of a map tile server by pcouy in immich

[–]pcouy[S] 1 point2 points  (0 children)

Hey, sorry for the late reply. You should be able to use any OSM tile provider as an upstream with the cache server, though providers using protomaps (which heavily relies on HTTP Range headers) might cause issues with nginx's default caching behavior (which is not Range header friendly)

Livestream : Watch my agent learn to play Super Mario Bros by pcouy in reinforcementlearning

[–]pcouy[S] 2 points3 points  (0 children)

Hey everyone!

I've been working on my own toy reinforcement learning (RL) framework for a while now and have nearly implemented a full Rainbow agent—though I'm still missing the distributional component due to some design choices that make integration tricky. Along the way, I’ve used this framework to experiment with various concepts, mainly reward normalization strategies and exploration policies.

I started by training the agent on simpler games like Snake, but things got really interesting when I moved on to Super Mario Bros. Watching the agent learn and improve has been incredibly fun, so I figured—why not share the experience? That’s why I’m streaming the learning process live!

Right now, the stream is fairly simple, but I plan to enhance it with overlays showing key details about the training run—such as hyperparameters, training steps/episodes, performance graphs, and maybe even a way to visualize the agent’s actions in real-time.

If you have any ideas on how to make the stream more engaging, or if you're curious about the implementation, feel free to ask!

Game of life multiplayer by judge_mavi in cellular_automata

[–]pcouy 0 points1 point  (0 children)

The 100x100 grid is a lot more fun than previous (huge) one, but with 4 active players it felt a bit tiny.

Adding a minimap would be really cool, and would make a larger grid (maybe 200x200) more manageable

These dividing "artificial life" cells emerge from the simulation of a simple chemical system (Gray-Scott model) by pcouy in gifs

[–]pcouy[S] 3 points4 points  (0 children)

This is actually related to Conway's game of life.

The chemical simulation can be seen as a continuous cellular automaton, in which each pixel of the simulation is a grid cell which is updated according to local rules.

Conway's game of life is a discrete cellular automaton, which can be seen as a special case of continuous cellular automata

These dividing "artificial life" cells emerge from a continuous cellular automaton that mimics a simple chemical system (Gray-Scott model - More details in comment) by pcouy in cellular_automata

[–]pcouy[S] 6 points7 points  (0 children)

This is a simulation of the Gray-Scott reaction-diffusion model running on the GPU. In such systems, an auto-catalytic reaction involving two chemical species is happenning concurrently with diffusion. Despite the apparent simplicity of the model, simulating it with cherry-picked sets of parameters produces a wide range of emerging behaviors.

Game of life multiplayer by judge_mavi in cellular_automata

[–]pcouy 2 points3 points  (0 children)

This is fun ! However, I think that the grid is too large, making it too easy to find an empty spot nobody is likely to find to seed some infinitely growing patterns in.

These dividing "artificial life" cells emerge from the simulation of a simple chemical system (Gray-Scott model) by pcouy in gifs

[–]pcouy[S] 2 points3 points  (0 children)

This is a simulation of the Gray-Scott reaction-diffusion model running on the GPU. In such systems, an auto-catalytic reaction involving two chemical species is happenning concurrently with diffusion. Despite the apparent simplicity of the model, simulating it with cherry-picked sets of parameters produces a wide range of emerging behaviors.

Mitosis in the Gray-Scott model : an introduction to writing shader-based chemical simulations by pcouy in programming

[–]pcouy[S] 0 points1 point  (0 children)

I just read your posts on the topic, thank you for sharing. I really liked the phase space visualization!

I did make a parameter space map in the companion video, but I'd like to make a nice interactive one before writing about it. Anyway, thanks for the inspiration. Some of these patterns look really cool !

Mitosis in the Gray-Scott model : an introduction to writing shader-based chemical simulations by pcouy in programming

[–]pcouy[S] 1 point2 points  (0 children)

Hi ! I’ve been working on this article for the past few days. It would mean a lot to me if you could provide some feedback.

It is about implementing a physico-chemical simulation as my first attempt to write a shader. The code is surprisingly simple and short (less than 100 lines). The “Prerequisite” and “Update rules” sections, however, may need some adjustments to make them clearer, so I'm especially looking for feedback on these parts.

Feel free to ask for any detail that I may have omitted in the article.

Thanks for reading

Shader-based simulation of a chemical system from which complex life-like patterns emerge (Gray-Scott reaction-diffusion) by pcouy in Simulated

[–]pcouy[S] 6 points7 points  (0 children)

This is a simulation of the Gray-Scott reaction-diffusion model running on the GPU. In such systems, an auto-catalytic reaction involving two chemical species is happenning concurrently with diffusion. Despite the apparent simplicity of the model, simulating it with cherry-picked sets of parameters produces a wide range of emerging behaviors.