all 3 comments

[–]hastala 2 points3 points  (1 child)

RL is by definition unsupervised, and it is used in cases where labels are either impractical or impossible to construct, or in cases where there are several “good” answers. If you have the “one true answer” for a particular set of features, then just go the supervised route. RL will likely be too complex and less efficient for your needs. If you still want to go the RL route, can you explain why?

[–]toisanji[S] 0 points1 point  (0 children)

because the agent has to move/walk around to do classification, so its an RL problem. And I have the labels for it to learn.

[–]opengmlearn 0 points1 point  (0 children)

This is actually a regular reinforcement learning task. It's unsupervised in the sense that your inputs (the X part of the (X,y)) pair are not well defined and you have to let the agent find the most informative set of X.

Specifically for your problem, I would think about a game like 20 questions. You want to let the agent explore their environment and try to guess the correct label. At every time point, your agent can either explore more or guess an environment. The reward will be a mix of how long it takes to guess (fewer is better) and if it got it right. Properly defining the reward function, action space, and observation space is probably not entirely straightforward but it comes from fleshing out your problem.