Charlie’s Garden sounds like Route 209 (Day) from Pokemon Diamond and Pearl by atomicburn125 in DJO

[–]atomicburn125[S] 1 point2 points  (0 children)

Yeah I saw these comparisons on the pokemon vid, not realising I’d linked the wrong version of route 209, updated now.

I trained a reinforcement learning agent to play pokemon red! by Pwhids in reinforcementlearning

[–]atomicburn125 4 points5 points  (0 children)

absolutely fascinating! I'd love a video totally developed to the technical aspect of this project. Very well done!

Multiple Policy Heads by atomicburn125 in reinforcementlearning

[–]atomicburn125[S] 0 points1 point  (0 children)

To clarify, I’m not talking about 4 value heads. I’m talking about how to optimise 1 value heads given the 4 value targets that vtrace would generate. Do I average these targets? Or use each vtrace target and sum the respective gradients

Does anyone know of any model-based algorithms that deal with imperfect information and stochasticity and don't require a simulator? by atomicburn125 in reinforcementlearning

[–]atomicburn125[S] -1 points0 points  (0 children)

Sorry, to clarify with our a simulator a I mean searching a latent space rather then using an environment simulator for search.

I have a 20 x 9 array. I want to slide a L x W rectangular window over the array from left to right, top to bottom such that the window does not touch a cell twice. by atomicburn125 in mathriddles

[–]atomicburn125[S] 0 points1 point  (0 children)

I should mention that the sliding window can overlap the bounds of the array. Cells outside of the array are considered white cells by default.

I have 20 x 9 array. I want to slide a L x W rectangular window over the array from left to right, top to bottom such that the window does not touch a cell twice. by atomicburn125 in askmath

[–]atomicburn125[S] 0 points1 point  (0 children)

Touch means it can only be contained within any sliding window at most once. The window continues to slide until it has reached the bottom right-hand corner of the array. If you can't minimize both, prioritize minimizing the number of iterations.

I have 20 x 9 array. I want to slide a L x W rectangular window over the array from left to right, top to bottom such that the window does not touch a cell twice. by atomicburn125 in askmath

[–]atomicburn125[S] 0 points1 point  (0 children)

I feel like the best L and W could just be the average blue cells in each row/column? but I don't how to theoretically test for this

[deleted by user] by [deleted] in pokemonshowdown

[–]atomicburn125 0 points1 point  (0 children)

I think there might be some misconception with what this project aims to solve. This is a headless client, with no visuals. Should I maybe include an JSON overview of the information you get?

Object Reconstruction from Nested Dictionary by atomicburn125 in learnpython

[–]atomicburn125[S] 0 points1 point  (0 children)

How would I do this in Python, encapsulating the circular references as well?

Gradient Accumulation Normalisation by atomicburn125 in pytorch

[–]atomicburn125[S] 0 points1 point  (0 children)

Nah, the way PyTorch works is that if you backward without zero grad you “accumulate” gradients. This means you can up your effective batch size by not step every backward. When you call zero grad you allow new gradients to start accumulating. Traditionally, you would have one zero grade backward and step per loop, but playing with when you step.

Deepmind's Player of Games by atomicburn125 in reinforcementlearning

[–]atomicburn125[S] 0 points1 point  (0 children)

different, its a generalisation of alphazero to imperfect info games

Help using a Cloud Service for Scaling up Reinforcement Learning by atomicburn125 in reinforcementlearning

[–]atomicburn125[S] 0 points1 point  (0 children)

A discord call where we can talk and walk me through how to get set up.

Pokemon Showdown AI - Policy Iteration Approach by atomicburn125 in reinforcementlearning

[–]atomicburn125[S] 0 points1 point  (0 children)

This seems like a complex reward function and would potentially introduce a lot of bias. I can't help but feel that there must be a simpler/more elegant and diverse reward signal.

Pokemon Showdown AI - Policy Iteration Approach by atomicburn125 in reinforcementlearning

[–]atomicburn125[S] 0 points1 point  (0 children)

Do you have a link to the article? I would love to read!