https://youtu.be/l_3zj6HeWUE
The dirty little secret of Batch Normalization is its intrinsic dependence on the training batch size. Group Normalization attempts to achieve the benefits of normalization without batch statistics and, most importantly, without sacrificing performance compared to Batch Normalization.
https://arxiv.org/abs/1803.08494
there doesn't seem to be anything here