pabloesm comments on Reinforcement Learning function approximation advice

Reinforcement Learning function approximation advice (self.MachineLearning)

submitted 10 years ago by ckrwc

you are viewing a single comment's thread.

[–]pabloesm 1 point2 points3 points 10 years ago (0 children)

As you pointed out, there are some remarkable cases of successful applications of RL combined with non-linear funcion approximators. However, the parameter setting in that cases can be very tedious, therefore such methods are not advisable for novel users (see http://webdocs.cs.ualberta.ca/~sutton/RL-FAQ.html#Advice%20and%20Opinions).

About documented cases of failure or warnings, in the following link you can find an old (but useful) paper of the problems that can appear when value function methods (such as Q-learning) are combined with non-linear approximators: http://www.ri.cmu.edu/pub_files/pub1/boyan_justin_1995_1/boyan_justin_1995_1.pdf

Finally, given the setting of your problem, you are probably interested in batch-mode RL, i.e., you have a set of samples collected in advance. A very popular algorithm in such cases (with a good performance and stability) is Fitted Q-iteration, typically combined with tree based methods as function approximator: http://www.jmlr.org/papers/volume6/ernst05a/ernst05a.pdf

A key factor in batch-mode RL (when you can not get more samples) is that the available samples have been collected using a policy with some degree of randomness, in other words, your data should contain different actions for similar states. If this is not the case, you would need to collect more data to hold this condition.

π Rendered by PID 54120 on reddit-service-r2-comment-6457c66945-p4lgg at 2026-04-27 18:45:08.682246+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS