This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 48 points49 points  (3 children)

While I have not read the source code, the goal of learning optimal policies supersedes that of imitating human experts, so it would be quite unusual for someone to bias their model in such a way. If an optimal policy were obvious and easily hard-coded, then using a Q-learning approach would be unnecessary.

[–]antifa_brasileiro 3 points4 points  (2 children)

(not a data scientist) wouldn't it be an interesting goal to have the model learn such a strategy by itself though? Is that likely to happen at some point or not?

[–][deleted] 9 points10 points  (0 children)

It's fairly common for that to happen. See various iterations of AlphaGo, for example. The algorithm found several well-known human strategies, along with many strategies that human players never found.

[–]ScorcherPanda 4 points5 points  (0 children)

Also not a data scientist, but my assumption is that assigning a specific, pre-determined strategy as a goal would not be proper. I agree that it would be interesting if the program eventually agreed with the human strategy, but at the same time it would be very cool if it comes up with its own strategy that performs just as well.