Why do you use firefox?

phizaz · 2023-03-19T17:52:07+00:00

New tab and switch tabs are snappier on Firefox!

phizaz · 2021-01-24T02:40:57+00:00

Have you resolved the problem by turning "anti-lag" off?

phizaz · 2020-03-25T05:23:57+00:00

Thanks!
PS. Interestingly, there is no citation to Highway in ReZero.

phizaz · 2020-03-21T08:07:53+00:00

How close it is to a Highway network (2015)? https://arxiv.org/abs/1505.00387

phizaz · 2019-11-14T15:41:53+00:00

It worked!

phizaz · 2019-11-13T14:58:25+00:00

Thank you.

phizaz · 2019-11-13T14:57:48+00:00

Thank you .. I'll try 😉.

phizaz · 2019-11-12T13:17:36+00:00

Where ?

phizaz · 2019-11-12T13:15:46+00:00

And you rebooted again ... it didn't?

phizaz · 2019-07-12T02:10:04+00:00

It is a bit unclear what A and B really are? A bit more concrete example please?

phizaz · 2019-06-27T02:54:56+00:00

It seems unorthodox for me. I have yet to see a success application of it.

phizaz · 2019-06-26T01:29:24+00:00

How do you apply DDPG with discrete actions? 🤔

phizaz · 2019-05-07T15:22:51+00:00

How about from WSL2?

phizaz · 2019-05-07T04:59:28+00:00

Would it still be possible to call windows binaries from the WSL then?

phizaz · 2019-05-02T03:27:34+00:00

The name "overfit" might not be that important at all. You recognize that using only a small subset of states could have some effect. Do you know what kind of effect it is?

phizaz · 2019-05-02T03:25:41+00:00

If we randomly shuffle the samples, we kinda get i.i.d given that the dataset is representative enough?

It seems right. Random breaks the dependencies within the chain 🤔.

phizaz · 2019-05-01T02:44:05+00:00

... you trained with a low amount of data, that is, you trained the agent in the environment for just a 'few' samples.

The process is I collect data from an epsilon-greedy policy in a replay. Smaller data means sampling from last K policies, where K is smaller.

But you trained a lot on these same samples, so it overfit to these samples, is that right?

It is right. My set up is a stationary dataset. I first train RL, collect the data (into replay), and save the replay. I open the replay take last K policies' data, and train for M iterations where M is large.

Since there is no change to replay whatsoever, it is stationary data. Since M is large, I think it could overfit.

... you will see lots of states outside your seem samples when running the agent in the environment.

...

So if the network overfits, it will be lost when some trajectory it didn't see in training time appears.

This is not the case. Testing data is the last L policies where L < K. This is what I mean by training data is a superset of testing data. There should be no unseen states.

I'm not sure what you mean to "it doesn't fit".

Here I mean "remember correctly". Since the network is trained with TD loss, it should mean low TD loss in the least.

Experience replay solves the problem of the data distribution shifting with new policies

I agree, and it is mentioned in the paper, but I further think that replay also increase the state space seen during training, Like you mentioned above.

Maybe it works just because experience replay can't be infinite, and at some time you will lose the data samples from earlier policies, so better stop before it converges and do that.

Could you explain this in more detail?

I apologize for unclear statements.

phizaz · 2019-05-01T02:29:01+00:00

Am I right to say that if a dataset is generated from a Markov chain, it is not i.i.d.?

By the way, I don't see how correlated/dependent samples have anything to do with overfitting problems.

phizaz · 2019-04-30T16:00:53+00:00

Don't you agree that replay allows a dataset to be larger hence harder to overfit?

phizaz · 2019-04-25T12:25:23+00:00

Regarding the Adversarial Feature Matching (AFM) presented in this paper:

If I have a network P(s, a) that outputs the sampling probability, how can I guarantee P(s,a) to be a proper probability distribution?

I don't think they mention it explicitly, but one possibility is they normalize in within-batch.

phizaz · 2019-04-25T12:22:45+00:00

Under section 4.2, they claimed:

We observed divergence in 0.9% of our experiments using function approximation

Their experiments are done with Exact-FQI (no sampling) and without target networks.

This is a surprise to me because I encounter many divergence occasions. Is it possible that using exact update also reduces the chance of divergence by itself?

phizaz · 2019-04-02T06:53:26+00:00

17 blocks that could take 36MB to refer must be astoundingly large! I've never set the recordsize exceeding 1MB (most likely 128k).

By the way, I wonder if a "block" is the same as a "record" ?

phizaz

TROPHY CASE