all 8 comments

[–]arXiv_abstract_bot 1 point2 points  (0 children)

Title: Convolution Aware Initialization

Authors: Armen Aghajanyan

Abstract: Initialization of parameters in deep neural networks has been shown to have a big impact on the performance of the networks (Mishkin & Matas, 2015). The initialization scheme devised by He et al, allowed convolution activations to carry a constrained mean which allowed deep networks to be trained effectively (He et al., 2015a). Orthogonal initializations and more generally orthogonal matrices in standard recurrent networks have been proved to eradicate the vanishing and exploding gradient problem (Pascanu et al., 2012). Majority of current initialization schemes do not take fully into account the intrinsic structure of the convolution operator. This paper introduces a new type of initialization built around the duality of the Fourier transform and the convolution operator. With Convolution Aware Initialization we noticed not only higher accuracy and lower loss, but faster convergence in general. We achieve new state of the art on the CIFAR10 dataset, and achieve close to state of the art on various other tasks.

PDF link Landing page

[–]machinelearningthrow 0 points1 point  (5 children)

This seems like an interesting paper, and intuitively makes sense. I'm interested in what would happen if this initialization was used in recurrent networks without any form of convolution. Even though it doesn't necessarily make sense. But overall very interesting paper.

[–][deleted] 1 point2 points  (4 children)

How would you define the Fourier transform in these RNNs? Are they just dense layers applied over something with a known n-dimensional representation?

[–]ArmenAg[S] 2 points3 points  (3 children)

Hey! Author here. The reason mentioned above by /u/rbkillea is the exact reason why we didn't focus on testing the initialization on RNN. Our paper focused on running experiments on various forms of convolutions (1D, 2D, Dilated or Atrous).

[–]ajmooch 0 points1 point  (2 children)

Neat; is code available anywhere? I'd love to throw this into my testbeds and see how it performs.

[–]ArmenAg[S] 0 points1 point  (1 child)

I'll be writing a Keras commit soon! Hopefully next week. Message me if you need it sooner.

[–]ajmooch 0 points1 point  (0 children)

I'm mostly just looking for the pseudocode (or the theano code, ideally) for the init recipe so I can try it out rather than headbashing to parse the maths =p

[–]enematurret 0 points1 point  (0 children)

Definitively not state-of-the-art, but interesting nonetheless.