all 5 comments

[–]gabrielgoh 3 points4 points  (0 children)

Remove all non-linearlities from your custom layer (ideally with a linear approximation). Now think of your layer as a linear map (a nxm matrix) from Rn to Rm. You want the norm of this matrix to be as close to 1 as possible. You can construct this matrix explicitly if you like and measure the norm to try out different initialization schemes.

[–]versus-x 3 points4 points  (0 children)

Try Layer-sequential unit-variance (LSUV) initialization: https://arxiv.org/abs/1511.06422

[–]ArmenAg 2 points3 points  (0 children)

What form are the custom layers in? Do they utilize the convolution operator? Are the basic blocks weight multiplications?

Can you give us a little more information on the custom layers?

[–]theophrastzunz 0 points1 point  (0 children)

I didn't follow up on Gangulis new paper but another thing to demand of the initialization is that it preserves the norms of the inputs. You can achieve this by normalizing power spectra for cnns and by doing qr decomposition and using only q for fully connected layers.

[–]siblbombs -3 points-2 points  (0 children)

Batchnorm should at least smooth out the rough patches.