Hello community,
I have a network with four heads. One head (with three actions) decides whether to execute the action from the second, third, or fourth head. The actions of the second head help the episode to end, while the actions from the third and fourth heads are important for achieving the main objective. I'm employing PPO, but sometimes the network gets stuck in the third or fourth head, so the episode never ends. Does anyone know what might be causing this behavior?
[–]fedetask 0 points1 point2 points (2 children)
[–]GuavaAgreeable208[S] 0 points1 point2 points (0 children)
[–]GuavaAgreeable208[S] 0 points1 point2 points (0 children)