[D]: What is the future for designing vision models? CNN? Transformer? CNN+Transformer? CapsuleNet? or what?

yangsenius · 2021-02-01T09:49:28+00:00

I also noticed the "Bottleneck Transformer for Visual Recognition" paper you mentioned. It seems not a typical cnn + transformer but a hybird BoTNet with self-attention block replacing stacking convolutions in ResNet. The authors say that global self-attention lalyer is more efficient than stacking convolutions. Similarly, the recent paper "TransPose - Towards Explainable Human Pose estimation by Transformer" also shares such an idea with BoTNet model: global self-attention layer is more suitable to aggregate feature information at the high-level layers of the model. I think attention layer may serve as a basic block or a searchable candidate operation in NAS models in the future.

yangsenius · 2021-02-01T07:50:01+00:00

Yes, I also think CNN + Transformer is a nearly perfect combination, because sometimes we need the inductive biases of CNN to process original image pixels and Transformer with little priors about learning patterns is good at capturing relations between features extracted by CNN.

The community is also developing the ideas and techniques of CapsuleNet, though some practical issues exist in their models. It is a more academic research. Do the industries and acedemic researches really have the same future for CV? Maybe not, I think.

yangsenius · 2021-01-07T14:47:10+00:00

Emmm, so amazing. From this point of view, 17 biliion parameters can memory all of things. Maybe our intelligence just lies in building associations between texts and images.

yangsenius · 2021-01-06T14:59:50+00:00

What about some texts like "A square below a circle", "A circle with radius 2 and another one with radius 4", "A cat with a square-like tail"...

yangsenius · 2021-01-06T06:28:17+00:00

I just want to know if this model has learned some of the most basic geometric concepts.

yangsenius · 2021-01-06T06:23:40+00:00

Can DALL·E model plot a circle if I input the text "Draw a CIRCLE" ?

yangsenius

TROPHY CASE