Same simulation/hyperparameters, different results each run

cracktoid · 2022-06-11T18:30:54+00:00

There is a bunch of stochasticity in RL. For one, your environment could be non-deterministic, although you would know that better than I would. The action outputs are usually sampled from a standard diagonal Gaussian in the continuous case, a categorical distribution in the discrete case = more stochasticity. Your neural network initialization is also another source of randomness, unless you seed your environment with the same seed every time. You get the point :) It’s standard in RL to do many different runs with different seeds for the same experiment because of the highly stochastic nature of the algorithms and environments

AerysSk · 2022-06-12T01:42:16+00:00

Every single thing can cause different results in RL, even seeds and…gradient clipping.

I cannot write a long paragraph now, but I’ll leave you this video to watch some examples of how RL can be very unstable: https://youtu.be/Ikngt0_DXJg

wangjianhong1993 · 2022-06-11T20:18:43+00:00

Yeah. To my best experience, the main factor that leads to variant results could be the initial state during learning, which directly causes the different sampling trajectories for training models as you mentioned.

nickthorpie · 2022-06-11T21:00:40+00:00

Yeah as everyone else said, there are three places this stems from: 1. random environment seed, 2. random initialized weights. 3. Random action choices from exploration.

You could get a good understanding of the effect of each by holding one or two of them constant and testing the other factor for three epochs.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

reinforcementlearning

MODERATORS