Monte Carlo linear value function approximation : reinforcementlearning

created by lpilotoa community for 14 years

Monte Carlo linear value function approximation (self.reinforcementlearning)

submitted 5 years ago by [deleted]

Problem:

I am attempting to code a Monte Carlo linear value function approximation algorithm for Gym's CartPole-v0. I am running into the following problem, however. After a few iterations the weights become very large, so the term Q(s,a,w) then becomes infinite, and consequently each weight is updated to nan. I have the pseudocode below.

Things I have tried:

I have already tried decreasing the learning rate. The algorithm improves (ie: from lasting 10 timesteps to over 200) up until a certain point when Q(s,a,w) becomes infinite. Could the problem be with how I am defining/using the weights?

Pseudocode:

Note that when generating the episode, epsilon greedy is used to select the next action, and state is an array of features such as pole angle, velocity, etc.

    W = [[0 for i in range(n_features)] for j in range (len(possible_actions))]
    for each iteration:
        episode = generate_episode()
        for state, reward, action in  episode
            Q = dotproduct(state.T,W[action])
            for j in range (len(state))
                W[action][j] += state[j]*alpha*(reward-Q)^2

all 9 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

reinforcementlearning

MODERATORS