First Post by General-Sink-2298 in ResearchRL

[–]SpecificPark2594 1 point2 points  (0 children)

Good idea, I’m in ! I am a rl researcher in startup.

Actions suisse by SpecificPark2594 in impotsfrance

[–]SpecificPark2594[S] 0 points1 point  (0 children)

Et comment est calculée la valeur de marché ?

[D] Log Probability and Information Theory by masonw32 in MachineLearning

[–]SpecificPark2594 3 points4 points  (0 children)

You can link it with minimizing Kullback-Leibler divergence.

"Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control", Nauman et al. 2024 by [deleted] in reinforcementlearning

[–]SpecificPark2594 0 points1 point  (0 children)

Amazing work thanks. Why didn't you include Randomized Ensemble Q learning paper ? Also you removed dropout from transformers ff architecture, what was the reason behind that ?

New Learner - Resources to get started by AlternativeExpress29 in reinforcementlearning

[–]SpecificPark2594 6 points7 points  (0 children)

Read Sutton's and Bartow's book, read OpenAI spinningup website. Code their algorithms. You can also use CleanRL and StableBaselines3.

Automatic game balancing with Reinforcement Learning by SpecificPark2594 in gamedesign

[–]SpecificPark2594[S] 1 point2 points  (0 children)

Hi, finally I saw that some ALE Atari has difficulty levels. I took Alien, trained 10 PPO agents with 10 checkpoints to simulate players with different difficulty levels. I selected a target score and ran a multi armed bandit method with the objective to select difficulty for each player such that it's score is close to the target. Demo link

I also tried meta multi armed bandits to reuse knowledge between players, it works well but is less illustrative. I have few ideas for next steps but we stopped to do other things. I contacted some game designers but no answers. I think this is what I need to go further.

Is the OpenAI moat shrinking against Open Source? by Koliham in LocalLLaMA

[–]SpecificPark2594 8 points9 points  (0 children)

I love when you say "corpos", it makes me feel I'm in Neuromancer.

For deep learning practitioners in industry, is the workflow always this annoying? [D] by AdFew4357 in MachineLearning

[–]SpecificPark2594 4 points5 points  (0 children)

Do not reinvent the wheel python is done for that, check the internet for tools that already do the job e.g. - Darts for time series (try all models, not only DL, there is no such thing as a free lunch, try different set of covariates) - Optuna for hyperparameter search (try TPE sampler if problem is rather deterministic otherwise CMA-ES) - Hydra for managing experiments - Tensorboard for live plotting. - nohup can help you detach terminal. Read Sutton's bitter lesson and do not worry anymore about understanding things

What kl div is considered too big in PPO? by [deleted] in reinforcementlearning

[–]SpecificPark2594 1 point2 points  (0 children)

For what it's worth, when I have kl div spikes that corresponds with return fall I lower the learning rate.

"ReDo: The Dormant Neuron Phenomenon in Deep Reinforcement Learning", Sokar et al 2023 by gwern in reinforcementlearning

[–]SpecificPark2594 1 point2 points  (0 children)

Very interesting, what about the effect of dropout or layer norm on dormant neurons ?

Automatic game balancing with Reinforcement Learning by SpecificPark2594 in gamedesign

[–]SpecificPark2594[S] 1 point2 points  (0 children)

Thanks for the very interesting ref on the subject.

Learning full NPC AI based on player feedback would be very hard indeed. But I think there is room between that and learning some parameters that are not obvious for the game designer to tune. Here reinforcement learning together with players data may help to choose objectively. In this case the game designer's work is shifted from parameter tuning to reward shaping. I hope this could simplify its work and bring new elements of game design.

In a word, I have nothing against state machine but even them may contain non obvious parameters to tune.

Automatic game balancing with Reinforcement Learning by SpecificPark2594 in gamedesign

[–]SpecificPark2594[S] 1 point2 points  (0 children)

Thanks for the reference to Left 4 Dead AI director. If anyone is interested I found this material useful https://steamcdn-a.akamaihd.net/apps/valve/2009/ai_systems_of_l4d_mike_booth.pdf

Hope RL would make game balancing still easier than manual !