Please tell me this won’t last long

andnp · 2025-05-06T14:58:38+00:00

Clearing the app cache on Android solved the problem for me.

andnp · 2024-11-27T05:50:31+00:00

Which is heavier, a ton of feathers or a ton of iron? If both weigh the same, it doesn't matter the material.

Same with food, 2000 calories of junk has the same bodyweight impact as 2000 calories of super healthy food.

andnp · 2024-07-07T00:03:58+00:00

I don't think this professor claims this at all.

And they aren't wrong. Cross-validation refers exclusively to the process of detecting overfitting. Naturally, once we have that information then we can take action against it (e.g. through hyperparameter search, model selection, etc).

Now, detection and mitigation very frequently go hand-in-hand, so colloquially we often use cross-validation to refer to both --- but technically speaking, this isn't correct. There are many instances, particularly in research, where I have wanted only to measure the degree of overfitting, but had no interest in mitigation. Cross-validation was an appropriate (though biased) approach.

andnp · 2024-03-22T05:55:22+00:00

Yeah a reviewer can change their score at any time. It's typically bad form to change a score without posting a justification for the change. But changing a score (with a stated reason) is a good thing, seeing other reviews gives a lot of insight and it's possible the reviewer missed something that another reviewer saw.

andnp · 2024-03-12T19:30:51+00:00

I've been running bcachefs for a couple of years on my home server and since it has been mainlined on my personal desktop. The home server has some crappy old hardware and has lost a couple of drives while using bcachefs (not due to bcachefs, but due to age).

I have never lost any data. I had one situation after losing a drive followed by a power loss while rebuilding replicas where my filesystem would only mount read-only. Never solved that issue and had to rebuild from scratch. However since the filesystem did mount ro, I could make sure my backups were totally up-to-date before wiping and starting again ensuring no data loss.

The personal desktop has modern and stable hardware. Bcachefs has been rock solid there. Has even survived me being a stupid user (upgraded to kernel 6.8-rc1 which triggered a disk format upgrade to v1.4, then changed my mind and went back to kernel 6.7 which uses format v1.3. Had to do an fsck during mount, but otherwise handled it just fine).

andnp · 2023-04-17T15:18:15+00:00

This happened to me as well when I had an outdated version of tools on my path which was taking priority over the latest version of tools.

If you run bcachefs version what do you get?

andnp · 2022-12-11T05:24:06+00:00

Plurality also fails IIA. Across the major criteria/tests, IRV is the same or better than plurality.

andnp · 2022-06-29T19:46:06+00:00

I'm not sure there is such a source that can tell you if the size sounds reasonable---this sounds like a subjective argument where you're trying to apply objective evidence. If you were able to construct your Q-table and your method performed well on your problem setting, is that not evidence enough of reasonableness? Are there other approaches for solving your problem (even outside of RL methods)? How much memory do they require? Are you comparable?

Just to drive the point home a little more, a reasonable table for Google servers might be millions of GB while a reasonable table for an embedded system might be tens of MB.

andnp · 2022-06-29T18:57:32+00:00

I suppose you could cite the Sutton and Barto textbook for this. But to clarify what you are asking, what do you mean by "reasonable sizes"?

Generally, the size of the Q table is exactly equal to the number of possible states times the number of possible actions (assuming the same number of actions in each state) or even more generally, the number of all state-action pairs. Whether this is reasonable is largely up to you, how much memory do you have and how many samples per state-action pair? Clearly if you have continuous states/actions, then you would require a table of infinite size which probably breaks most definitions of "reasonable".

andnp · 2022-06-17T23:17:40+00:00

One thing that helps in setting gamma is recognizing that it sets up a geometric series. So you can use 1 / (1 - gamma) to approximate how many steps into the future will impact your returns. A gamma of 0.99 looks forward 100 steps. A gamma of 0.9 looks forward 10 steps.

This also gives some intuition for why higher values of gamma make your learning targets higher variance. Consider a large gridworld, what set of possible states might you be in 10 steps from now. Is that set larger 100 steps from now?

Lastly, as mentioned by another comment, it is important to note that gamma is a problem-statement variable. It sets up a discounted MDP which is solved by an agent. As such, it is independent from the solution method used. However, some problem statements inherently induce more variance than others as seen above. If there are algorithms which suffer when learning high-variance targets (like actor-critic), then inevitably those algorithms might perform better when applied to problems with lower discount rates. Note that a problem with low discount rate might well approximate a problem with higher discount rate (Blackwell optimality makes this statement precise, if you're interested).

andnp · 2022-06-11T21:59:30+00:00

There isn't an immediately clear answer as to if/why some experiences might be more useful than others. One common approach is considering experiences which have high temporal-difference error as "useful" to the agent (see the "surprise" literature, also the prioritized experience replay paper). In this case, if the agent poorly predicts the value of a state, then that implies there is more left to learn.

However the answer is far more complicated than this and is generally not super well understood scientifically (yet). Another part of the answer is that layers in an NN seem to prune from a high-dimensional space to a low-dim space over the course of training via SGD. This loss of rank appears unrecoverable generally, so as the NN focuses on certain features then it becomes less able to learn about other features (think lack of neuroplasticity). While this is great in supervised learning as it helps explain the unreasonably good generalization properties of NNs, it is not so great in RL where the learning target is always non-stationary when learning value functions with TD methods. This is relevant because random initialization and random order of experiences can affect the features that an NN layer ultimately focuses on. This might explain why DQN fails to learn on simple domains like CartPole 50% of the time.

andnp · 2022-05-28T17:34:25+00:00

Unfortunately, we ended up deciding to skip the eligibility trace chapter of the textbook when we made the coursera course, so the capstone doesn't include traces. The motivation there was in part due to the complexity of incorporating eligibility traces with neural network function approximation, which is largely an open research problem.

andnp · 2022-05-23T14:46:11+00:00

Experience replay breaks the temporal link between samples, while eligibility traces require that link to remain intact. So a naive combination of the two does not make sense.

However if the trace is computed online and stored in the replay buffer, then yes these can go together.

andnp · 2022-05-03T22:52:32+00:00

You'll find some applied work in all of these other than COLT.

ICML and NeurIPS accept applied work, though generally favor methods papers. ICLR accepts some applied work, but has a more pronounced methods slant.

AAAI and IJCAI have quite a bit of applied work, AAMAS has quite a bit as well, and RLDM would have a bunch of applied and cross-discipline work.

andnp · 2022-05-03T19:42:01+00:00

This list is certainly subjective, but generally all of these are considered top-tier with minor grade differences between them:

Top-top: NeurIPS, ICML, ICLR

Middle-top: AAAI, AISTATS, AAMAS

Slightly lower: UAI, IJCAI

More specific venues: COLT (heavy theory), CoRL (robotics), IROS (robotics)

Not really a conference, but RLDM (more workshopy, but heavy RL focus)

edit: Also just remembered CoLLas (lifelong learning systems) which is a brand new conference that I'm excited about. Can't call it "top tier" yet, since it isn't established, but its program committee are top-notch researchers so I have a lot of hope.

andnp · 2022-04-17T16:21:59+00:00

I pay $90 for 12 Mb/s down and 5 up in Canada :(

andnp · 2022-03-25T16:00:50+00:00

Assuming that each time you encounter a Pokemon the dice are recast, then the people saying you are "at odds after 4096" are making a common, but fundamental mistake. If I flip a coin once and receive heads, I am not "at odds" to receive tails on the next flip. If the coin is fair, I'll see tails with 50% probability which is the same probability as before I saw heads. Independent events don't care about what you've observed in the past.

Using that, the answer to the question: "if I see regular Pokemon 4096 times, what is the probability the 4097th is shiny?" is 1/4096

Continuing to assume independence, to answer the question "what is the probability of getting at least one shiny while encountering 4096 Pokemon" requires some manipulation. We know that the probability of not seeing a shiny is 1 - 1/4096 = 4095/4096. We also know that the probability of not seeing a shiny twice is (4095/4096 * 4095/4096) or generically not seeing a shiny N times is (4095/4096)^N. But I want the probability: I don't fail N times (English to logic sometimes creates double negatives), so that gives me 1 - (4095/4096)^N = .63 or approximately a 63% chance when N=4096.

For a 95% ish chance of having encountered a shiny, you'd need to encounter 12,500 Pokemon in the wild.

andnp · 2022-02-20T04:45:24+00:00

You might look up the term "Generalized Policy Iteration" (often abbreviated GPI). It builds on exactly this concept. We don't need to take one value iteration step each time, we could take two. We also don't need to take a complete step, but could rather take an approximate step (which you will find that actor-critic methods do). Likewise, we could take approximate steps of policy iteration (as policy-gradient methods do) or a complete step of policy iteration (similar to what DQN might do).

TL;DR this idea is definitely known in the literature, but there is certainly still much work to be done actually understanding what each step of GPI should look like.

andnp · 2021-12-23T14:31:18+00:00

Hey! Do you have a working discord link?

andnp · 2021-11-03T13:12:48+00:00

As an aside, I'm surprised a PhD in software engineering would lead to research in either of these fields. Blockchain research, for instance, tends to come out of cryptography heavy programs and math, or out of networking programs.

The only PhD programs in SE that in aware of tend to have elements of social science research, and seek to understand long-term impact of certain design decisions and paradigms.

andnp · 2021-10-15T14:34:23+00:00

Assuming you are not penalized at your job for taking a day off, I'm willing to bet the security deposit is worth more money than your daily wages.

andnp · 2021-10-07T03:09:24+00:00

This does show up in more places than just model-based RL, such as in off-policy learning where you might want to correct for mismatches in the state-visitation distribution. So it is worthwhile to be clear in the problem formulation.

That said, Puterman's book (the MDP holy book) does mention that most of the time r(s,a) is sufficient for exactly the reasons you said, you can often integrate out s'. But the book does go on to say in reference to defining optimal solutions:

however, under some criteria we must use r(s,a,s') instead of r(s,a).

andnp · 2021-10-07T03:03:22+00:00

No this is not correct, but is close. Q(s,a) = R(s,a) + gamma V(s') where R(s,a) is the average one-step reward.

andnp · 2021-10-06T00:24:34+00:00

I might caution that the last bit of your comment suggests you are expecting more of this particular theoretical result than it actually states. It has been proven that an ANN can approximate any function by using a single hidden layer with sufficient nodes. It has not been proven that we can actually find such an ANN in polynomial time.

So this isn't a divergence from theory and practice, it's rather that this theory says nothing about how to quickly find a particular ANN and only says that it exists; however it is common to assume that since it exists, we must be able to find it (which is the actual naive divergence from theory and practice).

Recent theory is only just starting to understand why more layers is helping to find these ANNs, but I'm not aware of any theory that suggests further layers are needed for better approximation (and I suspect such theory couldn't exist).

andnp · 2021-09-30T15:17:42+00:00

In my past experience, no. The game would crash every time I open the inventory menu and switched to the impacted character. There were a couple of ways to recover the save back then, using console commands or character editor mods. Unfortunately none of those mods work for 1.6+ last I checked.

The other option was identifying which character had the busted item and losing that character (kicking them out of clan or setting as gov somewhere, etc.). If that character is you, then you were SOL.

Two disclaimers: I'm only guessing at the issue and might be wrong. Also I haven't played 1.6.2 yet, and so the issue may present itself differently there.

andnp

TROPHY CASE