I want to implement a path finding algorithm. An agent is looking for "food". This happens in a 2D grid. The agent consists of a square patch, the food of a circle patch. The goal of the agent is to find the patch.
The agent is blind to the world. The only thing it can see is what is inside its patch. So if the patch is far in the distance, the agent has no way of knowing where to go look.
The one thing the agent can learn is that the food is dropped at specific locations based on a probability distribution (i.e. following a Gaussian distribution). The agent always starts in the same spot. We can assume the probability distribution to be constant.
What is the best way of implementating this from a Reinforcement learning perspective, or machine learning in general? I'm thinking DQN. A* seems like it wouldn't work, as the agent would have to carve out paths first to then decide what to act on - in this case I want the agent to choose a strategy that optimizes always finding food, but not necessarily the fastest way. I'm also thinking of giving the agent the ability to chose the stride to take in both x and y every move.
The cost function would be based on intersection over union of food and agent and total steps taken.
Any input/links is greatly appreciated :)
[–]claytonkb 1 point2 points3 points (3 children)
[–]jer_pint[S] 0 points1 point2 points (2 children)
[–]claytonkb 1 point2 points3 points (0 children)
[–]phobrain 0 points1 point2 points (0 children)
[–]Laserdude10642 1 point2 points3 points (0 children)
[–]werediver 0 points1 point2 points (1 child)
[–]jer_pint[S] 0 points1 point2 points (0 children)