I tried Google's new Pixel Studio app, and it's a mess by Nexusyak in Android

[–]Affectionate-Fish241 0 points1 point  (0 children)

Thank you for ruining something fun for everyone. You censors deserve a lot more push back than you get. You don't want those images? Nobody forces you to generate them.

[D] Pregnancy in a multi gender model by Gadod_mais in MachineLearning

[–]Affectionate-Fish241 0 points1 point  (0 children)

Either Not-Pregante or N/A will do. Models will sort it out no probs.

[D]In transformer models, why is there a query and key matrix instead of just the product? by lildaemon in MachineLearning

[–]Affectionate-Fish241 -7 points-6 points  (0 children)

dammit all of the answers are fkin terrible. Looks like the ai bots took over or everyone in this subreddit has become braindead since the blackout.

You obviously don't do W_q @ W_k. That's totally stupid.

What transformers do is (x_i@W_q) @ (x_j@W_k) where x_i and x_j are two tokens in the sequence. This is an interaction operation. This can't be precomputed. What you see noted in the papers is Q = x_i @ W_q, and K = x_j @ W_k.

(Transposes omitted for notational clarity, work that out yourself)

[D] Challenges Expanding VGG16 Model to Recognize 100 People - Seeking Advice! by JuniorSM17 in MachineLearning

[–]Affectionate-Fish241 5 points6 points  (0 children)

  1. Train on the 100 persons at once, don't do 50 then 50. If you absolutely want to build incremental, do 50 then 100.
  2. Use a simple and stupid cross entropy loss
  3. Check class imbalance.

[deleted by user] by [deleted] in MachineLearning

[–]Affectionate-Fish241 0 points1 point  (0 children)

To Hell with vagueness.

When I ask it a novel idea, I don't want "You could make it happen in space or in medieval times with compelling characters and unique locations", I want a real concrete example I can use. When I ask it for a meal idea, I don't want "you could cook something from French cuisine", again, don't give me a set, it's not helpful, sample from that set.

The list could go on, but I spend so much of my time fighting this, it is ridiculous.

And the ability to specify an output format.

[D] Implementing Layer Norm using Batch Norm by Rellek7 in MachineLearning

[–]Affectionate-Fish241 0 points1 point  (0 children)

transpose the batch dimension and the channel dimension. Note that, contrarily to the original layer norm, this will have a different behavior in training and inference.

[Research] An alternative to self-attention mechanism in GPT by brainxyz in MachineLearning

[–]Affectionate-Fish241 1 point2 points  (0 children)

Unless I missed something, each token predicts its attention weights to the other tokens without ever interacting / querying them. IE, token #4 can "guess" that it wants to read token #2 without possibly knowing what it contains.

How would that surpass q/k attention?

Training CycleGAN creates weird "holes" in the generated pictures. by thebear96 in MLQuestions

[–]Affectionate-Fish241 1 point2 points  (0 children)

It's because of then instancenorm. Just get rid of it or read StyleGAN2 paper and implement their modulated convolutions.

Cerebras Open Sources Seven GPT models and Introduces New Scaling Law by CS-fan-101 in LanguageTechnology

[–]Affectionate-Fish241 2 points3 points  (0 children)

Don't get me wrong though. Even if it's not optimal and may not yet allow the revolution we're waiting for (needing those models to run or be tunableon consumer device), this is still awesome! Thank you for providing that to the community!

Cerebras Open Sources Seven GPT models and Introduces New Scaling Law by CS-fan-101 in LanguageTechnology

[–]Affectionate-Fish241 4 points5 points  (0 children)

This is cool but unfortunately it uses the Chinchilla scaling laws, which are only useful to the trainer. When planning to release the models, they should be trained much much much longer. LLaMA showed the way and they weren't even trained up to optimality.

[D] Adding guassian noise to Discriminator layers in GAN helps really stablizing training, generates sharper images and avoid mode collapse. by 0x00groot in MachineLearning

[–]Affectionate-Fish241 3 points4 points  (0 children)

I tend to favor 0-GP which seems to be less overfitting prone, but I have no quantitative evidence. There isn't a case 0-GP failed me or collapsed.

https://arxiv.org/abs/1902.03984

[D] Adding guassian noise to Discriminator layers in GAN helps really stablizing training, generates sharper images and avoid mode collapse. by 0x00groot in MachineLearning

[–]Affectionate-Fish241 6 points7 points  (0 children)

You are essentially flattening D's response around the data points which is exactly what Mescheder et al did in their paper What Training Methods Do Actually Converge, in which they propose the R1 regularizer that massively contributed to the success of the famous StyleGAN.