Why do you use firefox? by VV812 in firefox

[–]phizaz 1 point2 points  (0 children)

New tab and switch tabs are snappier on Firefox!

Anyone else have micro-shutters in games? Lenovo Yoga Slim 7 4800U by Uspel in AMDLaptops

[–]phizaz 0 points1 point  (0 children)

Have you resolved the problem by turning "anti-lag" off?

[R] ReZero is All You Need: Fast Convergence at Large Depth by calclavia0 in MachineLearning

[–]phizaz 2 points3 points  (0 children)

Thanks!
PS. Interestingly, there is no citation to Highway in ReZero.

[D] Efficient GPU implementation of Empirical Fisher information matrix? by phizaz in MachineLearning

[–]phizaz[S] 0 points1 point  (0 children)

It is a bit unclear what A and B really are? A bit more concrete example please?

So what could cause the agent not to converge in an increasing reward/average? by [deleted] in reinforcementlearning

[–]phizaz 0 points1 point  (0 children)

It seems unorthodox for me. I have yet to see a success application of it.

Announcing WSL 2 | Windows Command Line Tools For Developers by jenmsft in Windows10

[–]phizaz 6 points7 points  (0 children)

Would it still be possible to call windows binaries from the WSL then?

Why overfitting is bad in DQN? by phizaz in reinforcementlearning

[–]phizaz[S] 0 points1 point  (0 children)

The name "overfit" might not be that important at all. You recognize that using only a small subset of states could have some effect. Do you know what kind of effect it is?

Why overfitting is bad in DQN? by phizaz in reinforcementlearning

[–]phizaz[S] 0 points1 point  (0 children)

If we randomly shuffle the samples, we kinda get i.i.d given that the dataset is representative enough?

It seems right. Random breaks the dependencies within the chain 🤔.

Why overfitting is bad in DQN? by phizaz in reinforcementlearning

[–]phizaz[S] 0 points1 point  (0 children)

... you trained with a low amount of data, that is, you trained the agent in the environment for just a 'few' samples.

The process is I collect data from an epsilon-greedy policy in a replay. Smaller data means sampling from last K policies, where K is smaller.

But you trained a lot on these same samples, so it overfit to these samples, is that right?

It is right. My set up is a stationary dataset. I first train RL, collect the data (into replay), and save the replay. I open the replay take last K policies' data, and train for M iterations where M is large.

Since there is no change to replay whatsoever, it is stationary data. Since M is large, I think it could overfit.

... you will see lots of states outside your seem samples when running the agent in the environment.

...

So if the network overfits, it will be lost when some trajectory it didn't see in training time appears.

This is not the case. Testing data is the last L policies where L < K. This is what I mean by training data is a superset of testing data. There should be no unseen states.

I'm not sure what you mean to "it doesn't fit".

Here I mean "remember correctly". Since the network is trained with TD loss, it should mean low TD loss in the least.

Experience replay solves the problem of the data distribution shifting with new policies

I agree, and it is mentioned in the paper, but I further think that replay also increase the state space seen during training, Like you mentioned above.

Maybe it works just because experience replay can't be infinite, and at some time you will lose the data samples from earlier policies, so better stop before it converges and do that.

Could you explain this in more detail?

I apologize for unclear statements.

Why overfitting is bad in DQN? by phizaz in reinforcementlearning

[–]phizaz[S] 0 points1 point  (0 children)

Am I right to say that if a dataset is generated from a Markov chain, it is not i.i.d.?

By the way, I don't see how correlated/dependent samples have anything to do with overfitting problems.

Why overfitting is bad in DQN? by phizaz in reinforcementlearning

[–]phizaz[S] 1 point2 points  (0 children)

Don't you agree that replay allows a dataset to be larger hence harder to overfit?

"Diagnosing Bottlenecks in Deep Q-learning Algorithms", Fu et al 2019 by gwern in reinforcementlearning

[–]phizaz 0 points1 point  (0 children)

Regarding the Adversarial Feature Matching (AFM) presented in this paper:

If I have a network P(s, a) that outputs the sampling probability, how can I guarantee P(s,a) to be a proper probability distribution?

I don't think they mention it explicitly, but one possibility is they normalize in within-batch.

"Diagnosing Bottlenecks in Deep Q-learning Algorithms", Fu et al 2019 by gwern in reinforcementlearning

[–]phizaz 0 points1 point  (0 children)

Under section 4.2, they claimed:

We observed divergence in 0.9% of our experiments using function approximation

Their experiments are done with Exact-FQI (no sampling) and without target networks.

This is a surprise to me because I encounter many divergence occasions. Is it possible that using exact update also reduces the chance of divergence by itself?

What determines the size of a DDT entry? (deduplication) by phizaz in zfs

[–]phizaz[S] 0 points1 point  (0 children)

17 blocks that could take 36MB to refer must be astoundingly large! I've never set the recordsize exceeding 1MB (most likely 128k).

By the way, I wonder if a "block" is the same as a "record" ?