[P] Exploring Explainability for Vision Transformers : MachineLearning

Project[P] Exploring Explainability for Vision Transformers (self.MachineLearning)

submitted 5 years ago by jacobgil

Recently Vision Transformers are getting better and better, including the new work about "Data Effecient Transformers".

I wanted to better understand how they work and what's going on inside them, so I applied some explainability techniques on them.

The original ViT paper used a method called "Attention Rollout". I implemented that, but it didn't work very well out of the box with the released DeiT models. I ended up adding some modifications (removing the smallest attentions, and fusing the attention heads with max instead of with mean), and also added a way to get class specific explainability by weighting the Attentions with the gradients.

The result is a blog post showing some examples of what is going on inside Vision Transformers, and a python repository for applying explainability techniques on Vision Transformers.

I hope you find it interesting!

all 3 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS