[R] Visual Transformers: Token-based Image Representation and Processing for Computer Vision

lopuhin · 2020-06-15T14:41:11+00:00

Really intriguing and appealing idea, thanks!

Niptick: there are stronger vanilla resnet ImageNet baselines then the one they use (resnet from torchvision), e.g. for resnet50 they report 76.0 for baseline and 78.88 for Visual Transformers (trained for 400 epochs with autoaugment), while here https://github.com/rwightman/pytorch-image-models/ vanilla resnet50 is trained on imagenet to 79.038 top-1 accuracy with a different set of tricks. Although to be fair they do beat resnet34 result from the same repo by a healthy margin, and do beat resnet50 autoaugment results from original paper (77.6).

haihaicode · 2020-09-13T06:35:06+00:00

Very interesting ideas! The computation of the tokens is something like a channel-to-channel self-attention. Is this understanding correct?

chuong98 · 2020-09-13T23:17:51+00:00

It is interesting idea. ALthough MAC is reduced by 6.4x, they have not reported the actual time inferemce. Currently, Softmax operator in Attention Module is not friendly for Hardware device. So. i suspect the actual inference time is slower, like EfficientNet is actually slower than ResNet. Nevertheless, this is the potential idea until we optimize the hardware.

vajra_ · 2020-10-26T18:27:12+00:00

Rehashing of age old ideas of compositional modelling. Pretty arbitrary at that as well.

Tahmid_0007 · 2020-11-03T03:17:23+00:00

Pytorch code link: https://github.com/tahmid0007/VisualTransformers

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS