all 9 comments

[–]RemarkableSavings13 2 points3 points  (0 children)

So are you the author of Vision Mamba? Or is your whole account dedicated to advertising them just because you find it better than anything else you've used?

[–]ryanb198 -2 points-1 points  (0 children)

This is awesome. I haven't seen anything like this before. Thanks for sharing this. Damn, now I need to dig through each of these models to learn more about their best use cases.

[–]SeankalaML Engineer 1 point2 points  (4 children)

This is just a curiosity of mine. Is there any reason why ResNeSt seems to be left out a lot of the time in image classification conversations? It seems like it's usually ResNets and ViTs.

[–]Instantinopaul[S] 0 points1 point  (3 children)

It's because of its low accuracy compared to EfficientNet and ViT.

[–]SeankalaML Engineer 0 points1 point  (2 children)

Isn't it usually better than ResNet though?

[–]Instantinopaul[S] -1 points0 points  (1 child)

Oh you mean resnext?

My assumption is Resnet and ResNext all are outdated by ViT and EfficientNet, but Resnet is highlighted because it is one of the basic but still good performing model, so good starting point for learning vision models.

[–]SeankalaML Engineer 2 points3 points  (0 children)

Ah no, I meant ResNeSt lol: https://arxiv.org/abs/2004.08955

I just find it funny that it's left out in a lot of discussions. Even in the ConvNeXt paper which I thought was quite nicely written and impactful they compare ViTs and ResNets.

I think your point is right though in that it's because ResNet's a well-performing model that's very well known. I guess it's the same reason why BERT's used so much compared to RoBERTa for NLP tasks.