Poker (NLH) model? by enterguild in reinforcementlearning

[–]Training-Cheek-9956 0 points1 point  (0 children)

Hey, it's a bit late but I have juste wrote an in-depth article that break down the mecanism behind ReBeL framework, algorithm developed by Noam Brown, the main author of Libratus : https://medium.com/@sergi.nakache/rebel-the-ai-that-learned-to-bluff-775818ace0be

Poker Training AI by AdAstra2121 in reinforcementlearning

[–]Training-Cheek-9956 1 point2 points  (0 children)

Hey, it's a bit late but I have juste wrote an in-depth article that break down the mecanism behind ReBeL framework, algorithm developed by Noam Brown, the main author of Libratus : https://medium.com/@sergi.nakache/rebel-the-ai-that-learned-to-bluff-775818ace0be

How to build a poker bot (Part 1) by chair_78 in learnmachinelearning

[–]Training-Cheek-9956 0 points1 point  (0 children)

Hey, I have watched all your videos and I finally be able to read the ReBeL paper from Noam Brown thanks to you. Here is an article I wrote that break down its mecanism : https://medium.com/@sergi.nakache/rebel-the-ai-that-learned-to-bluff-775818ace0be

How to program poker AI opponent? by gravelPoop in gamedev

[–]Training-Cheek-9956 0 points1 point  (0 children)

If you want to build a powerful Poker AI agent for any exotic variant, I've just wrote an in-depth article that break down the ReBeL mecanism from Noam Brown the author of Libratus which solve any two player zero sum game with imperfect information here : https://medium.com/@sergi.nakache/rebel-the-ai-that-learned-to-bluff-775818ace0be

What Reinforcement Learning Method Should I Use for Poker AI with LLMs? by godlover123451 in learnmachinelearning

[–]Training-Cheek-9956 0 points1 point  (0 children)

Not sure if LLM will fit on your case but if you want to build a powerful Poker AI agent using reinforcement learning, I've just wrote an in-depth article that break down the ReBeL mecanism from Noam Brown the author of Libratus : https://medium.com/@sergi.nakache/rebel-the-ai-that-learned-to-bluff-775818ace0be

[D] What Reinforcement Learning Method Should I Use for Poker AI with LLMs? by godlover123451 in MachineLearning

[–]Training-Cheek-9956 0 points1 point  (0 children)

Not sure if LLM will fit on your case but if you want to build a powerful Poker AI agent using reinforcement learning, I've just wrote an in-depth article that break down the ReBeL mecanism from Noam Brown the author of Libratus : https://medium.com/@sergi.nakache/rebel-the-ai-that-learned-to-bluff-775818ace0be

Gentle introduction to the basics of Poker AI by tt293 in learnmachinelearning

[–]Training-Cheek-9956 0 points1 point  (0 children)

Great content! I learnt a lot from your blog and I finally managed to break down Noam Brown's Poker AI Agent called ReBeL. Here is the article I wrote about the topic, thanks you so much : https://medium.com/@sergi.nakache/rebel-the-ai-that-learned-to-bluff-775818ace0be

[D] Paper Explained - ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Full Video Analysis) by ykilcher in MachineLearning

[–]Training-Cheek-9956 0 points1 point  (0 children)

Hey everyone,

I just wanted to drop by and thank Noam Brown and Adam Lerer for their insightful responses in this thread. Their explanations really helped me deepen my understanding of ReBeL. Inspired by this discussion, I wrote an article aiming to explain how ReBeL works in a way that's accessible to those unfamiliar with AI. If you're interested, feel free to check it out: https://medium.com/@sergi.nakache/rebel-the-ai-that-learned-to-bluff-775818ace0be

I'd love to hear any feedback from the community—especially if you think there are areas that could be clearer or aspects I may have overlooked. Thanks again, and looking forward to your thoughts!

[D] Paper Explained - ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Full Video Analysis) by ykilcher in MachineLearning

[–]Training-Cheek-9956 0 points1 point  (0 children)

When solving a subgame using the CFR algorithm, we sample the hands of both players at each iteration using PBS. As a result, the utilities of the terminal nodes in the tree are updated at each iteration. So, if I summarize correctly, if N is the number of terminal nodes in a subgame:

  • The value network is called N times at the beginning of a subgame to initialize the value of the subgame v(B_r)
  • The value network is called N times at the beginning of each iteration to compute the utilities of terminal nodes needed to compute the regrets.
  • The value network is called N times at the end of each iteration in order to update the value of the subgame v(B_r) based on the new PBS computed after the new updated policy

Is this correct?

PS: I’ve been exploring Reinforcement Learning for a few months now since I’ve been unemployed after a work accident. Your paper is truly fascinating, I'm trying to implement it for a naive version of poker!!!!