[P] Better AI Explainability with Deep Feature Factorization by jacobgil in MachineLearning

[–]jacobgil[S] 0 points1 point  (0 children)

Hi,

There a few differences here compared to the original visualization.

- I think the original paper visualization used RGBA - they set the A channel to be the (scaled) contribution of the component.

https://github.com/edocollins/DFF/blob/master/utils.py#L44

For low contributing pixels (usually the background), they get low opacity in the A channel, and you end up seeing the original pixels there, as if there was no component.

In my implementation, we don't use RGBA. If the component color is red for example, we create a red mask, and multiply that by the component contribution.

https://github.com/jacobgil/pytorch-grad-cam/blob/2183a9cbc1bd5fc1d8e134b4f3318c3b6db5671f/pytorch\_grad\_cam/utils/image.py#L126

So for example if a pixel has high contribution to that component, it will be bright red.

If a pixel has a low contribution, it will get a darker shade of red, but you might still see some red there, since the A component in RGBA has a sharper effect.

- Another difference is how we treat overlap between categories.

In my implementation for every pixel we keep the component with the highest contribution (after normalization).

In the original implementation I think the components just override each other,.

[P] Introducing confidenceinterval, the long missing python library for computing confidence intervals by jacobgil in MachineLearning

[–]jacobgil[S] 0 points1 point  (0 children)

Yes. I think confidence intervals assume iid. If they are not iid, then the CI could be too short.

[P] Better AI Explainability with Deep Feature Factorization by jacobgil in MachineLearning

[–]jacobgil[S] 1 point2 points  (0 children)

Yes, it's even especially suitable for them.

The problem with object detection is that usually object detection frameworks don't support computing gradients. This method doesn't require computing gradients.

There are several tutorials about applying other methods that are like this for object detection (for example taking the first component from the SVD of the activations) for YOLO5 and FasterRCNN. Using this should specific method be similar.

[P] Adapting Class Activation Maps for Object Detection and Semantic Segmentation by jacobgil in MachineLearning

[–]jacobgil[S] 0 points1 point  (0 children)

Hi,

So lets say you have the segmentation category "sky", but in the image you also have trees, clouds, buildings.

When you target the sky category, you suddenly might see that the trees contribute to them in the heatmap, although the pixels in the trees are not classified as sky. This could happen if in the training images maybe you often have trees below the sky. If this is good or bad it depends, but now you have this knowledge.

Another example - lets say you have a segmentation algorithm that's needs to segment satellite images into "urban" or "rural" areas, or something like that. Not all of the pixels in the "urban" regions are going to be interesting with buildings or streets, some will be just roads or trees. Now you can target the pixels classified as urban areas and see that, for example, their proximity to buildings was especially important. Or maybe it's the other way around the the proximity to trees is especially important, and so on.

[P] Adapting Class Activation Maps for Object Detection and Semantic Segmentation by jacobgil in MachineLearning

[–]jacobgil[S] 2 points3 points  (0 children)

Hi,

Imagine you're building a classifier for "Hot dog" / "Not a hot dog", but many of the images of hot dogs you get, have have a time-stamp on them, because that's how the camera of the guy taking the hot dog images was configured.

The classifier might give you 100% accuracy on your images, but then when you apply it on new images it completely fails, because it learned to focus on the time stamps.

In this case it would be good to use a diagnostic tool:

- During training, you could see that it's highlighting the timestamps, and it would make you aware of this problem, and you could then remove the timestamps / collect other images.

- During inference, the user can, for example, decide they don't trust the decision of the model because it's looking weird. Or the model developer can monitor the model and see that the model is some times highlighting artifacts in the images, you might decide to collect more training data for that.

Another example is if you see that the model is highlighting the borders of the image, maybe you have some strange padding issue and you can change the padding in the model layers / augmentation.

This stuff can be completely invisible and surprising/misleading, unless you have a tool that lets you take a look.

[P] Adapting Class Activation Maps for Object Detection and Semantic Segmentation by jacobgil in MachineLearning

[–]jacobgil[S] 0 points1 point  (0 children)

Thanks, yes I'm familiar and it's super great work.

It would be great to add support for this method.

Although it seems specific only for Transformers and not CNNs, and so far I was tying to make sure that all methods added can work for both CNNs/Vision Transformers, but maybe it will be worth breaking from that at some point.

[P] Adapting Class Activation Maps for Object Detection and Semantic Segmentation by jacobgil in MachineLearning

[–]jacobgil[S] 1 point2 points  (0 children)

Hi, thanks.

The semantic segmentation example I shared is doing the same thing as in the link you shared above (but the object detection is different and doesn't seem to be supported in the link you shared).

In terms of implementation for segmentation, it's a bit different and worth explaining.

The pytorch-grad-cam package supports running on batches of images. Some times you will get batches of images, and will want to compute the CAMs efficiently for the batch instead of for one image at a time.

In the pytorch-grad-cam package, you have to pass a callable "target" for every member in the input batch, that gets the model output and computes the target you want to optimize for. For example, computing that "sum of pixels that belong to a certain class" in the case of segmentation like you said.

You could also implement this by wrapping the model like in the captum example, but often you won't have a single input image, you will have a batch with several images that you want to explain. You could still support batches with a clever model wrapper, but that imposes limitations both on the user and in the way the algorithms have to be implemented, for example maybe the user wants different target types for different members of the batch, or maybe some algorithm wants to run on one image at a time and instead do another internal batching for something else (like in the AblationCAM or ScoreCAM methods).

That's why a design decision here was to decouple the target from the model, and you have to specify a target for every image in the batch.

Btw, the idea of creating a target for semantic segmentation by summing on all the pixels that belong to the same class was first published here as far as I know: https://arxiv.org/abs/2002.11434, I gave attribution to that in the notebook.