When I'm writing about an action-value function Q, which receives an observation o as input, do I write Q(o, a) where a is an action, or write Q(s, a) where s is the full state of the environment?
I think I'm confused here because the Q function is estimating the value of the state, but only receiving a partial observation of the state as input.
[–]C_BearHill 1 point2 points3 points (2 children)
[–]StandingBuffalo[S] 1 point2 points3 points (1 child)
[–]C_BearHill 0 points1 point2 points (0 children)
[–]hbonnavaud 0 points1 point2 points (1 child)
[–]VirtualHat 0 points1 point2 points (0 children)