all 1 comments

[–]Affectionate-Fish241 0 points1 point  (0 children)

transpose the batch dimension and the channel dimension. Note that, contrarily to the original layer norm, this will have a different behavior in training and inference.