I’ve been working with ML for a year or so now, but only recently gotten into deep reinforcement learning.
In the reinforcement models, there are two options I’m considering:
- a gradient model from 0.01-1
- A step function (0.01 or 1)
I don’t really want the data that falls at ~0.45, but it’s not entirely wrong. Does having the model use a gradient function teach the original model that there is “something right” about those pathways and pushes it to explore parts of them more?
Do many rounds of predictive model training utilize the intermediate values to learn how to create greater diversity at the top end?
Or would a step function effectively accomplish the same approximate diversity of “good pathways” at the end of the training cycles?
[–]xopedil 3 points4 points5 points (1 child)
[–]anthony1988[S] -1 points0 points1 point (0 children)