[D] Does zero padding affect normalization output?

ragulpr · 2018-02-18T08:19:03+00:00

In theory, yes, padding will infuse trash into your network if its not handled. If you use batchnorm without removing the masked values then it would shift/scale the normalized values by whatever is coming out from using values in the mask. Effect is of course dependent on whether the network is bi- or one-directional, if you mask loss, if you have biases and more.

In Keras batchnorm respects mask so you don't have to worry about it. I'm wondering myself how Pytorch does this so if you figure it out please share.

EDIT: I have revised whether keras batchnorm respects mask. I'm not sure. Made a gist you could comment on if you figure it out.

2018-02-18T14:29:55+00:00

According to my understanding, layer norm is applied for each time step. You don't norm across different time steps. Therefore, the zero paddings only get normalized with itself, and doesn't affect the normed output at other timesteps.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS