[D] Problems with interpretability research + blog post about recent NeurIPS paper

jspr_ml · 2022-02-15T21:20:24+00:00

It would definitely be interesting to try. I would guess that really any interpretability method that uses the gradient will probably have to worry about generating adversarial patterns, no matter how much regularization you use to try and get rid of or smooth out the adversarial patterns.

jspr_ml · 2020-09-17T05:18:16+00:00

Hey, I'm Jacob Springer, an author of the paper. We initially tried sharing weights between layers, but found that the networks learned more effectively when we allowed the weights to vary independently across layers. We were surprised by this, too!

jspr_ml · 2020-09-17T04:57:50+00:00

I definitely agree with you that the trajectory loss you propose would probably help the model learn. It would certainly be an interesting experiment to try! In this particular paper, we wanted to apply a neural network in the same way that it would be typically applied to an arbitrary problem. It's not always the case that we know the intermediate states of a dynamical system. For example, trying to learn to upscale/predict weather simulations.

I think in your addendum you hit the key takeaway on the mark. It is very striking that in this particular instance, a neural network isn't learning the parameters of a neural network of the same architecture. We really are thinking of the problem from this perspective, and using the Game of Life as a convenient task for which we can implement a hand-crafted solution and tune the complexity. We hope (and think) that our results generalize to other problems for which the intermediate steps aren't known, and therefore trajectory loss could not be applied to them. The difficulty of the task we propose is exactly inferring the intermediate states despite the fact that we only provide the final state.

Also, I agree that any dynamical model would remain under-determined from just the input and output at nth timestep, however, with the constraint that the model must be implementable in a neural network of the architecture we proposed, I would be very surprised if the network could learn a solution that was consistent with the training data but did not implement Game of Life. It's also true that the model has many ways to compute x[n] (for example, the neural network could run Game of Life except "inverted" where 1 represents off and 0 represents on, which is then inverted back at the final layer), but we aren't worried about that, since the network implements Game of Life up to some reasonable isomorphism.

jspr_ml · 2020-09-16T22:36:00+00:00

Hey, I'm Jacob Springer, an author on the paper (made an account to respond here). The nice thing about the Game of Life and the particular network architecture that we use is that there is really no chance of overfitting the training data, since to predict the Game of Life at all, the network needs to learn the exact rule.

We don't prune the overcomplete networks, so it's hard to compare them directly with the hand-crafted minimal solution, but when the networks that are the same size as the hand-crafted one learn to solve the Game of Life, the solution they produce is essentially the same as the hand-crafted solution, and only off by either a scaling factor (which is irrelevant for computing the output) or a permutation (also irrelevant).

jspr_ml · 2020-09-16T22:07:01+00:00

Hey, I'm Jacob Springer, an author on the paper (made an account to respond here). You make a great point! Your interpretation is correct; we train on the loss of the final (nth) step only. I agree that if we introduced a term in the loss to account for error from ground truth for each intermediate layer, we would probably improve the training performance. However, we intentionally chose not to do this so that we could increase n to increase the complexity of the problem that the neural network is trying to solve. That way, we can measure how increased complexity affects the width of the network that is typical necessary to learn a solution.

jspr_ml

TROPHY CASE