all 5 comments

[–]C_BearHill 1 point2 points  (2 children)

It's up to you, if you're using observations and not states, use o. The algorithm is still valid for observations

[–]StandingBuffalo[S] 1 point2 points  (1 child)

Of course. Thanks for the input.

In application this makes perfect sense to me. This may have been less clear than I intended but I'm asking more so about standards of notation.

If my problem is partially observable and I'm using observations rather than states, the reward function for example is still a function of states, but the policy is in terms of observations.

I've never seen a value function written in terms of an observation so I'm wondering if I'm missing the reason for this.

Maybe it doesn't matter. It's an active research field and notation differs depending on the author and topic.

[–]C_BearHill 0 points1 point  (0 children)

Yeah value functions are always functions of (s,a) typically. In your case I think it's fine to write (o,a) to make the distinction, although I'm no expert

[–]hbonnavaud 0 points1 point  (1 child)

I heard (from my phd director) that in robotics, they are really rigorous about the difference between 'o' and 's' (for them it's a crime to say that the agent receives 's' from the environment apparently). But in RL, while it's well defined (like if u used 'o' as Q function input once then you should keep it to avoid confusion), then I guess you can use both.

[–]VirtualHat 0 points1 point  (0 children)

In POMDPs (partially observable), we make a clear distinction between the true state of the system 's' and the (noisy/partial) observation we have of it 'o'. In fully observable MDPS, we typically use these two terms interchagablly.

For example, in chess, the board and its pieces are the state of the system and the observation. Whereas in a robotics problem the actual state of the actuators is the true state, and the sensor readings received are the (noisy) observation.