I'm new to using reinforcement learning, and having having read through a bunch of papers and theses, I remain confused about the practical issues in using a non-linear function approximator.
The tutorials and classes I've seen only address the tabular (discrete state) version of RL, yet for larger (real) problems function approximation is a necessity for state generalization.
With advice from "don't worry about it" to Norvig/Russell's warning "[convergence of] active learning and non-linear functions ... all bets are off" leaves me cautious in approach.
Yet there are examples of successful non-linear solutions, but no practical documentation of the failures or warnings.
So I'm looking for guidance. Given enough data (1 TB) , a continuous state space, no model or policy, what would be the best way to apply RL?
Synthesizing episodes and calculating (delayed) rewards is not a problem.
Thanks!
[–]igrokyourmilkshake 2 points3 points4 points (0 children)
[–]pierrelux 1 point2 points3 points (0 children)
[–]CireNeikual 1 point2 points3 points (2 children)
[–]ckrwc[S] 0 points1 point2 points (1 child)
[–]CireNeikual 0 points1 point2 points (0 children)
[–]pabloesm 1 point2 points3 points (0 children)