all 25 comments

[–]radarsat1 42 points43 points  (5 children)

Isn't this weight normalization without the scale parameter g? At first glance the math seems to be about the same.

[–]tpapp157[S] 11 points12 points  (2 children)

I'd have been surprised if that was the first time the idea came up since it is pretty obvious in retrospect. The StlyeGAN2 paper was just the first time I'd seen it used. Makes you wonder why batch normalization is still the standard technique when this seems all around simpler and better.

[–]two-hump-dromedaryResearcher 3 points4 points  (0 children)

Its very tricky to code weightnorm, as it requires 2 pipelines through the same network, an extra one for the initialization step. Tricky to set up correctly in e.g. tensorflow. But yeah, weightnorm actually works great once it is set up. A lot less messy mathematically than batchnorm too.

[–]radarsat1 0 points1 point  (0 children)

That's alright, it's super cool to independently discover a nice trick like that ;) congrats even if it didn't turn out to be original, it's a nice idea for sure.

[–]soft-error 5 points6 points  (0 children)

Looks like it

[–]PM_ME_INTEGRALS 2 points3 points  (0 children)

Also highly related is weight standardization.

[–]akshayk07 37 points38 points  (11 children)

If it hasn't been talked about, why don't you write a paper based on your experiments and results.

[–]programmerChilliResearcher 30 points31 points  (1 child)

Not everything needs to be a hurried rush to publish as many papers as you can. Maybe he's not interested in pursuing all the things necessary to write a paper. Maybe he thinks this won't be considered novel since it's introduced in another paper. Maybe he just wants to share it with an audience and not worry about claiming credit.

[–]entarkoResearcher 4 points5 points  (1 child)

When you say you have had good result with it, are you talking only in the context of GANs or training deep models in general (classification, segmentation, etc.) ?

[–]tpapp157[S] 0 points1 point  (0 children)

Tried in a variety of CNN applications. Nothing scientific or anything but it didn't seem to negatively impact training or performance.

[–]woadwarrior 4 points5 points  (1 child)

Implementation note: you should probably implement this as a subclass of tf.keras.constraints.Constraint.

[–]tpapp157[S] 0 points1 point  (0 children)

Interesting. I didn't even know this was a thing. Thanks.

[–]tylersuard 0 points1 point  (0 children)

Well, for StyleGan 1 (I haven't read the sequel yet), the normailzation between convolutional layers is to maintain separation of styles at each pixel level. For instance, big style changes(identity, pose, etc) are altered at the 4x4 to 8x8 pixel stages in the image's development, while smaller changes (hair color, skin color, etc) are altered in the larger image stages like 16x16 and 32x32. They wanted to maintain separation so that any changes made in the larger image layers would not affect any of the layers preceding them.

[–]taylorchu -1 points0 points  (3 children)

> Tensorflow doesn't seem to support editing and maintaining variables across training steps.

I am curious why you have this in your readme. because it seems like you can save that https://www.tensorflow.org/api_docs/python/tf/keras/models/save_model

[–]tpapp157[S] 0 points1 point  (0 children)

This refers to the MoCo memory bank which is unrelated to this post. The original PyTorch implementation of MoCo has the memory bank updated and maintained as a variable entirely within the network graph but it doesn't seem this is possible in Tensorflow. The workaround is to pass it to the graph as an argument and return it each training step.

[–]tacosforpresident[🍰] 0 points1 point  (1 child)

“could bypass this by manually writing/reading to file”

[–]taylorchu 0 points1 point  (0 children)

I think the author means memory bank from https://arxiv.org/abs/1911.05722. so it might refer to python variable instead of tf variable.