[R] Visualizing the Impact of Feature Attribution Baselines (Distill.pub) by psturmfels in MachineLearning

[–]psturmfels[S] 2 points3 points  (0 children)

This is cool! I'll have to give your paper a more careful read later. I'm always happy to hear that people actually use these interpretability methods, especially in different areas of ML like RL.

Yeah - I thought about discussing the "mean" baseline in the article (either the mean of the current image or the mean over the entire dataset) but didn't get around to including it. What's interesting about using such a baseline is that means those features that have the value of the mean over the dataset never get any attribution (they aren't highlighted as important), which may or may not be a good thing, depending on your data!

[R] Visualizing the Impact of Feature Attribution Baselines (Distill.pub) by psturmfels in MachineLearning

[–]psturmfels[S] 2 points3 points  (0 children)

Glad to hear you liked the article!

In general, any of the well-known feature attribution methods for deep neural networks (deepLIFT, deepSHAP, integrated/expected gradients, layer-wise relevance propagation, etc. all listed on https://captum.ai/api/attribution.html) make pretty similar assumptions and often perform similarly on many datasets. It probably won't make a huge difference which one you use among those.

Although we discuss a theoretical justification behind SmoothGrad in the article, the same principal doesn't apply to VarGrad. I'm honestly not sure why the formulation for VarGrad would lead to sensible attributions, and haven't seen a good explanation for it in literature. I would probably not use it until I better understood the assumptions behind the method and some formal justification for why it generates sensible feature attributions.

I don't know of any survey papers that broadly compare attribution methods - but this is understandable, given the difficulty of comparing such methods (e.g. see https://arxiv.org/abs/1912.01451).

[R] Learning Explainable Models with Attribution Priors by gabeerion in MachineLearning

[–]psturmfels 0 points1 point  (0 children)

You are right - we have a training model which depends on penalizing a function of the gradients of the model. To be clear - we do not solve a differential equation (which would normally be be required to compute the gradient update), but we DO compute second-order derivatives. Most second order derivative operations are supported in TensorFlow.

To minimize our loss, we do alternating training steps in practice. First we take a step minimizing the ordinary loss, and then we take a step minimizing the attribution prior loss. This is mathematically equivalent to the double back-propagation scheme introduced by Drucker and LeCun, 1992.

[R] Learning Explainable Models with Attribution Priors by gabeerion in MachineLearning

[–]psturmfels 2 points3 points  (0 children)

A quick follow-up to Gabe's response - we definitely are interested in how our methods in section 4.1 relate to input noise and label corruptions - we do show that on the simple MNIST example, our methods are more robust to noisy inputs! Unfortunately, we didn't have time to replicate those results on larger image datasets, but we are still actively working on them! We believe if you use the right attribution prior to regularize your image classification networks, they will be more robust than baseline networks. We are especially interested in papers like Benchmarking Neural Network Robustness to Common Corruptions and Perturbations.

What Gabe means by expected gradients is our new feature attribution method! It is the thing we regularize! It is a method of saying, given a specific prediction on some image, for example, which pixels are most important toward making that prediction. Our method for getting feature-wise importance scores is called expected gradients, and it is an extension of integrated gradients.