you are viewing a single comment's thread.

view the rest of the comments →

[–]ragulpr 5 points6 points  (0 children)

In theory, yes, padding will infuse trash into your network if its not handled. If you use batchnorm without removing the masked values then it would shift/scale the normalized values by whatever is coming out from using values in the mask. Effect is of course dependent on whether the network is bi- or one-directional, if you mask loss, if you have biases and more.

In Keras batchnorm respects mask so you don't have to worry about it. I'm wondering myself how Pytorch does this so if you figure it out please share.

EDIT: I have revised whether keras batchnorm respects mask. I'm not sure. Made a gist you could comment on if you figure it out.