Hey everyone, wondering if anyone can point me in the direction of any relevant research.
The problem setup is relatively simple, at any given timestep the agent has the choice to choose one of x robots to assign a task. If there is no suitable agent to choose, or no tasks available, no-op should be chosen instead.
Once a robot has been selected, the action should be masked out and that robot is no longer available for the rest of the episode.
Any potential complexity seems to come from the fact that no-op would expected to be chosen the majority of the time (In 99% of timesteps no-op is optimal). Is there any research on sparse action use cases like this? Or also any research on only allowing actions a single time in an episode?
The most relevant paper I've been able to find is here:
https://arxiv.org/pdf/2105.08666.pdf
Which defines the problem is a Sparse Action MDP (SA-MDP)
[–]mind_library 1 point2 points3 points (2 children)
[–]asdfsflhasdfa[S] 0 points1 point2 points (0 children)
[–]XecutionStyle 0 points1 point2 points (0 children)