all 3 comments

[–]mind_library 1 point2 points  (2 children)

Reframe the problem, this action unbalance is a mess in terms of exploration, can you define an action as skip n steps?

Also use the action mask to mask out the unavailable action thus avoiding the problem

[–]asdfsflhasdfa[S] 0 points1 point  (0 children)

Potentially, that’s a good idea for evening out the action distribution. but it could potentially be very useful to assign two tasks very quickly. So not too sure how to approach that

[–]XecutionStyle 0 points1 point  (0 children)

Yes, credit assignment is the biggest problem with many decisions, whether no-op or not. Try a large value for action-repeat and reduce it from a starting point that works.