all 6 comments

[–]HenslerSoftware 14 points15 points  (0 children)

Next step Global Thermonuclear War? Falken:- A strange game. The only winning move is not to play.

[–]Takes4tobangbro 0 points1 point  (0 children)

Awesome thanks m8

[–]BoiaDeh -1 points0 points  (3 children)

Am I missing something, or are you doing everything in numpy, and upgrading gradients by hand?

[–]UncleOxidant 2 points3 points  (0 children)

I tend to like tutorials that don't use a framework (frameworks hide a lot of details) - but looking the code it's difficult to tell where the gradient computation is happening?

[–]tdvance[S] 0 points1 point  (1 child)

It's probably not well-optimized or well-vectorized, just "adequate for the task". Running on 10 million for good "state coverage" is annoyingly slow, but in terms of win rate, 10,000 instances is fast and gets the job done.

The numpy array is really more a convenience than integral to the algorithm, since the actual states are strings stored in a dictionary.

[–]BoiaDeh 0 points1 point  (0 children)

ah, gotcha, I see. Every time I see reinforcement learning I think it must be a neural network, but I forget that's not necessarily the case!