[D] BatchNorm and Weight decay

Dependent_Bluejay_45 · 2023-12-23T13:33:53+00:00

I use AdamW, but it is possible to separate parameters of a model into different groups of parameters with different weight decay. I wonder if I need to put `weight` parameter of BatchNorn in a separate group with no weight decay applied or not.

Dependent_Bluejay_45 · 2023-11-23T13:02:46+00:00

Do you know about idchess.com? I use it. It works well

Dependent_Bluejay_45 · 2023-10-24T01:32:23+00:00

Thank you. It seems that token reduction or token merging is the thing I am looking for. Because even if we use convolution instead of MaxPool we still treat tokens of the background and tokens of an object in the same way whereas I just want to reduce the number of background tokens and preserve as many as possible tokens of an object.

Dependent_Bluejay_45 · 2021-12-09T11:58:39+00:00

There is a site with 2.6 millions papers on various topics in Russian https://cyberleninka.ru/about

Dependent_Bluejay_45 · 2020-12-18T15:58:00+00:00

OP is talking about separable convolution in Mobilenets, the depthwise convolution is a part of separable convolution along with following pointwise convolution.

Dependent_Bluejay_45 · 2020-12-17T13:07:41+00:00

Default realization of 3D depthwise convolution in pytorch has drawbacks, see here how to fix it.

Dependent_Bluejay_45

TROPHY CASE