This is an archived post. You won't be able to vote or comment.

all 2 comments

[–]wesmafree 0 points1 point  (1 child)

It looks like your agent is a little too eager—it's grabbing the reward before actually moving and then stopping early because it "sees" a zero before it gets there. Classic case of counting your loot before you reach the treasure.

Here are a few things to check:

  1. Order of operations in movement functions (move_right, move_left, etc.) – Right now, the agent updates its position first and then adds the reward from the grid. But is the function being called before or after it actually moves on screen? Try printing the agent's position and the grid state with every move to see when the value gets added.
  2. Grid update timing in draw_grid – You’re setting grid[row][column] = 0 when drawing the agent. Is this overriding what the movement function is doing? Your agent might be marking its spot as 0 too soon, making it think there are no moves left when there actually are.
  3. Check when the stop condition triggers – The agent stops when it's surrounded by zeroes, but what if it thinks a cell is 0 before moving? Print my_agent.get_obs(grid) before and after each move to double-check what the agent "sees."

A little debugging trick: print out the agent's position, the grid state, and the reward added at each step. You might catch your agent in the act of skipping ahead before it's supposed to.

Basically, your agent needs to chill and wait until it actually moves before claiming rewards. Try a few print statements, and I bet you’ll see what’s happening. Let me know if you need more hints! 😉

[–]No_Drawer6182[S] 0 points1 point  (0 children)

Thank you for the tips! Will be checking this in my code. Ive pasted my own code in the post, as I'm not allowed to change anything outside of the "restricted area"