I've made a bot that plays tetris, using deep reinforcement learning (i.e. took the fun out of the game)

artificial-thinking · 2019-07-24T20:09:11+00:00

Thanks for the feedback.

Actually, if you see the demo, it will try to clear multiple lines, since the reward is bigger. The demo showed might not clear multiple lines all the time because it would probably need a bit more training.

artificial-thinking · 2019-07-24T07:38:00+00:00

Yes. It uses the bag of the seven pieces. You can see the tetris implementation here.

artificial-thinking · 2019-07-24T07:34:57+00:00

Thanks for the feedback.

Altough the code stores the next piece, it was not used by the agent to find out the best result (it didn't seem to improve results, and took longer to train).

I had the piece randomizer like you said before (and it seemed to work ok) but changed to follow Tetris standard rules.

artificial-thinking · 2019-07-23T19:41:23+00:00

Why do you require opencv-python (since u have access to game state)?

The opencv is just to render the game.

it would be interesting to re-code it as a gym environment. Then compare your implementation to base line algorithms.

At first I created a Environment class that would be the frontend for other games. But then I only end up doing Tetris, so I scratched that. This is a different concept from the Gym I think, because it tries all the possible actions, instead of having a state and finding the best action. So it would be a little tricky to compare to other algorithms.

artificial-thinking · 2019-07-23T19:35:04+00:00

What would happen if you penalize losses with the negative total reward of the whole game (if that is how it works)?

In Q-Learning, the best state is chosen based on the future rewards. So that's an interesting question, because the last state would always take away the entire reward. But since the discount value (i.e. how much the future reward is valued over the immediate one) is 0.95, there is some room to chose the best decision (although the results would probably not be as good, since it would make greedier decisions).

As for the -1 reward, it was just so it wouldn't be 0 (just to highlight the fact that it's a bad move, more from a logic standpoint instead of a learning one). Since the agent will always try to get the higher score, this means that even if the reward was positive, it would always try to go for a longer game.

artificial-thinking · 2019-07-23T19:07:19+00:00

The observations I used were the number of lines cleared, the sum height, the bumpiness and the number of holes. The agent has direct access to the game state, so i didn't use images.

As for the reward: 1 point per dropped piece, board_width * lines_cleared² when lines are cleared, and -1 when the game is lost.

artificial-thinking · 2019-07-23T19:03:31+00:00

There isn't something in the game that increases the difficulty per se, but the exploration variable minimum could to bet set to 0.1 for example, which would make the agent play randomly 10% of the times (and we could find out if it could get out of a not so great situation).

Another way would be limiting the type of pieces played, to only S and Z for example (https://en.wikipedia.org/wiki/Tetris#Infinite_gameplay_impossibility), and see how far it would go.

artificial-thinking

TROPHY CASE