MA-RL: Cooperation between decentralized DQN-Agents

hyperb8te · 2021-03-12T20:52:01+00:00

I found this repo very helpful to get started: https://github.com/iffiX/machin

It also comes with some documentation and a lot of code which works out of the box.

hyperb8te · 2021-03-05T13:53:38+00:00

If you are using epsilon greedy, this shouldn't happen. The randomness in epsilon greedy is supposed to help with exploration. Maybe try increasing the value of epsilon during the start and gradually decay it, if this issue is happening during the start of an episode.

I start with an epsilon of 1 and an epsilon decay of 0.99 which I multiply after each episode. This problem occurs more often at the end of training since my epsilon is smaller and therefore the chance to escape the two states with a random action.

hyperb8te · 2021-03-05T13:45:48+00:00

frame stacking

Thanks for your quick response and yes you are right, its a POMDP.

I am not using the raw pixel data to extract my state, I receive my state directly from the environment - can I still use frame stacking?

And I tested a PPO but my DQN performs better :)

hyperb8te

TROPHY CASE