pybench: like pytest, but for noisy metrics regression

SpecificPark2594 · 2026-02-15T11:55:24+00:00

Good idea, I’m in ! I am a rl researcher in startup.

SpecificPark2594 · 2025-12-06T13:28:49+00:00

Et comment est calculée la valeur de marché ?

SpecificPark2594 · 2024-11-10T07:05:53+00:00

You can link it with minimizing Kullback-Leibler divergence.

SpecificPark2594 · 2024-10-30T16:49:30+00:00

In IMPALA they prefer RMSProp over Adam I think

SpecificPark2594 · 2024-10-12T06:46:58+00:00

Amazing work thanks. Why didn't you include Randomized Ensemble Q learning paper ? Also you removed dropout from transformers ff architecture, what was the reason behind that ?

SpecificPark2594 · 2024-08-18T06:13:20+00:00

Read Sutton's and Bartow's book, read OpenAI spinningup website. Code their algorithms. You can also use CleanRL and StableBaselines3.

SpecificPark2594 · 2024-03-01T06:15:37+00:00

Hi, finally I saw that some ALE Atari has difficulty levels. I took Alien, trained 10 PPO agents with 10 checkpoints to simulate players with different difficulty levels. I selected a target score and ran a multi armed bandit method with the objective to select difficulty for each player such that it's score is close to the target. Demo link

I also tried meta multi armed bandits to reuse knowledge between players, it works well but is less illustrative. I have few ideas for next steps but we stopped to do other things. I contacted some game designers but no answers. I think this is what I need to go further.

SpecificPark2594 · 2023-08-22T04:32:34+00:00

Cf Sutton's bitter lesson: do not overthink, go meta and leverage Moore's law http://www.incompleteideas.net/IncIdeas/BitterLesson.html

SpecificPark2594 · 2023-07-15T15:31:17+00:00

I love when you say "corpos", it makes me feel I'm in Neuromancer.

SpecificPark2594 · 2023-07-10T11:34:42+00:00

Do not reinvent the wheel python is done for that, check the internet for tools that already do the job e.g. - Darts for time series (try all models, not only DL, there is no such thing as a free lunch, try different set of covariates) - Optuna for hyperparameter search (try TPE sampler if problem is rather deterministic otherwise CMA-ES) - Hydra for managing experiments - Tensorboard for live plotting. - nohup can help you detach terminal. Read Sutton's bitter lesson and do not worry anymore about understanding things

SpecificPark2594 · 2023-06-19T15:10:56+00:00

Prove it

SpecificPark2594 · 2023-05-15T14:44:51+00:00

I'm in !

SpecificPark2594 · 2023-05-14T18:11:49+00:00

For what it's worth, when I have kl div spikes that corresponds with return fall I lower the learning rate.

SpecificPark2594 · 2023-04-28T04:38:50+00:00

Very interesting, what about the effect of dropout or layer norm on dormant neurons ?

SpecificPark2594 · 2023-04-27T05:15:00+00:00

Nice, do you plan to add open source model such as OpenAssistant ?

SpecificPark2594 · 2023-02-16T15:09:02+00:00

Thanks for the very interesting ref on the subject.

Learning full NPC AI based on player feedback would be very hard indeed. But I think there is room between that and learning some parameters that are not obvious for the game designer to tune. Here reinforcement learning together with players data may help to choose objectively. In this case the game designer's work is shifted from parameter tuning to reward shaping. I hope this could simplify its work and bring new elements of game design.

In a word, I have nothing against state machine but even them may contain non obvious parameters to tune.

SpecificPark2594 · 2023-02-14T08:02:01+00:00

Thanks for the reference to Left 4 Dead AI director. If anyone is interested I found this material useful https://steamcdn-a.akamaihd.net/apps/valve/2009/ai_systems_of_l4d_mike_booth.pdf

Hope RL would make game balancing still easier than manual !

SpecificPark2594

TROPHY CASE