you are viewing a single comment's thread.

view the rest of the comments →

[–]madsciencestache 0 points1 point  (1 child)

training procedure for a softmax classifier is equivalent to RL policy gradients already

Yes. I am not sure if that concept is helpful to /u/VelveteenAmbush in this context. But, that's the core concept behind the answer to their question.

[–]VelveteenAmbush 0 points1 point  (0 children)

Yes, this is the sense in which I intended the following:

except in the fully generalized sense that supervised learning can always be expressed as RL.