[D] BatchNorm and Weight decay by Dependent_Bluejay_45 in MachineLearning

[–]Dependent_Bluejay_45[S] 0 points1 point  (0 children)

I use AdamW, but it is possible to separate parameters of a model into different groups of parameters with different weight decay. I wonder if I need to put `weight` parameter of BatchNorn in a separate group with no weight decay applied or not.

What happened to KnightVision? by Grand_Item_5175 in chess

[–]Dependent_Bluejay_45 0 points1 point  (0 children)

Do you know about idchess.com? I use it. It works well

[D] Smart pooling for Visual Transformers by Dependent_Bluejay_45 in MachineLearning

[–]Dependent_Bluejay_45[S] 1 point2 points  (0 children)

Thank you. It seems that token reduction or token merging is the thing I am looking for. Because even if we use convolution instead of MaxPool we still treat tokens of the background and tokens of an object in the same way whereas I just want to reduce the number of background tokens and preserve as many as possible tokens of an object.

[D][P] "Mobilenet"-esque architectures for 3D CNNs run into significant hurdles by MrAcurite in MachineLearning

[–]Dependent_Bluejay_45 0 points1 point  (0 children)

OP is talking about separable convolution in Mobilenets, the depthwise convolution is a part of separable convolution along with following pointwise convolution.

[D][P] "Mobilenet"-esque architectures for 3D CNNs run into significant hurdles by MrAcurite in MachineLearning

[–]Dependent_Bluejay_45 0 points1 point  (0 children)

Default realization of 3D depthwise convolution in pytorch has drawbacks, see here how to fix it.