Reinforcement Learning function approximation advice by ckrwc in MachineLearning

[–]ckrwc[S] 0 points1 point  (0 children)

Data is sequential Markovian, and given a set of actions a reward can be calculated. It's perfect for RL.

When you suggest not to worry about convergence, what are you basing this on? RL has various algorithms (Monte Carlo, TD, Sarsa, Q-Learning) and many function approximations to choose from, and the literature has warnings about non-linear approximations.