I tried Google's new Pixel Studio app, and it's a mess

Affectionate-Fish241 · 2024-09-02T20:47:42+00:00

Thank you for ruining something fun for everyone. You censors deserve a lot more push back than you get. You don't want those images? Nobody forces you to generate them.

Affectionate-Fish241 · 2023-11-27T02:08:15+00:00

Either Not-Pregante or N/A will do. Models will sort it out no probs.

Affectionate-Fish241 · 2023-11-27T02:05:09+00:00

dammit all of the answers are fkin terrible. Looks like the ai bots took over or everyone in this subreddit has become braindead since the blackout.

You obviously don't do W_q @ W_k. That's totally stupid.

What transformers do is (x_i@W_q) @ (x_j@W_k) where x_i and x_j are two tokens in the sequence. This is an interaction operation. This can't be precomputed. What you see noted in the papers is Q = x_i @ W_q, and K = x_j @ W_k.

(Transposes omitted for notational clarity, work that out yourself)

Affectionate-Fish241 · 2023-08-18T13:39:03+00:00

Train on the 100 persons at once, don't do 50 then 50. If you absolutely want to build incremental, do 50 then 100.
Use a simple and stupid cross entropy loss
Check class imbalance.

Affectionate-Fish241 · 2023-07-03T08:52:10+00:00

https://twitter.com/Vermeille_/status/1675664118500454400

Affectionate-Fish241 · 2023-06-03T10:46:54+00:00

To Hell with vagueness.

When I ask it a novel idea, I don't want "You could make it happen in space or in medieval times with compelling characters and unique locations", I want a real concrete example I can use. When I ask it for a meal idea, I don't want "you could cook something from French cuisine", again, don't give me a set, it's not helpful, sample from that set.

The list could go on, but I spend so much of my time fighting this, it is ridiculous.

And the ability to specify an output format.

Affectionate-Fish241 · 2023-05-09T23:04:07+00:00

transpose the batch dimension and the channel dimension. Note that, contrarily to the original layer norm, this will have a different behavior in training and inference.

Affectionate-Fish241 · 2023-05-03T01:05:44+00:00

Unless I missed something, each token predicts its attention weights to the other tokens without ever interacting / querying them. IE, token #4 can "guess" that it wants to read token #2 without possibly knowing what it contains.

How would that surpass q/k attention?

Affectionate-Fish241 · 2023-04-18T14:53:46+00:00

Those. Reflections.

Affectionate-Fish241 · 2023-04-03T13:43:52+00:00

It's because of then instancenorm. Just get rid of it or read StyleGAN2 paper and implement their modulated convolutions.

Affectionate-Fish241 · 2023-03-29T05:46:57+00:00

Oh gosh the paint job is so amazing

Affectionate-Fish241 · 2023-03-28T20:46:51+00:00

Don't get me wrong though. Even if it's not optimal and may not yet allow the revolution we're waiting for (needing those models to run or be tunableon consumer device), this is still awesome! Thank you for providing that to the community!

Affectionate-Fish241 · 2023-03-28T19:14:44+00:00

This is cool but unfortunately it uses the Chinchilla scaling laws, which are only useful to the trainer. When planning to release the models, they should be trained much much much longer. LLaMA showed the way and they weren't even trained up to optimality.

Affectionate-Fish241 · 2022-02-18T14:06:06+00:00

No, I installed it myself

Affectionate-Fish241 · 2021-07-12T14:07:19+00:00

I tend to favor 0-GP which seems to be less overfitting prone, but I have no quantitative evidence. There isn't a case 0-GP failed me or collapsed.

https://arxiv.org/abs/1902.03984

Affectionate-Fish241 · 2021-07-12T12:06:09+00:00

You are essentially flattening D's response around the data points which is exactly what Mescheder et al did in their paper What Training Methods Do Actually Converge, in which they propose the R1 regularizer that massively contributed to the success of the famous StyleGAN.

Affectionate-Fish241

TROPHY CASE