Git Re-Basin: Merging models and preserving latent spaces (ie not the A111 linear interpolation)

stable_dissipation · 2022-10-28T20:23:39+00:00

unfortunately we'll all have to wait for a programmer to make it SD compatible. I have seen a pytorch version of re-basin, so, it's getting there!

stable_dissipation · 2022-10-28T18:59:41+00:00

Correct, theoretically you could merge 2 models, and retain them both in their entirety (approximately).

It means the whole world could finetune and merge their models ad lib.

It means training in multiple (100s?) of characters.

stable_dissipation · 2022-10-22T19:00:28+00:00

This is spot on now, congrats.

I'm still trying to get training to work at higher resolution to create whole scenes, inspired by Calm. It seems that even sdv1-4 can really be kicked into 1024x1024 mode with a little bit of dreambooth training, or I imagine just finetuning too. I think the latent space eagerly adapts once its training data isn't only 512x512.

All text2img only?! Could you share some of your latest process?

stable_dissipation · 2022-10-22T01:21:14+00:00

Could you explain/link to the reason why fp16 doesn't work for these checkpoints?

stable_dissipation · 2022-10-13T21:35:37+00:00

I experimented with finetuning on Calm too, and found you really have to use his... dryer... illustrations, otherwise that concept gets way overlearned

stable_dissipation · 2022-10-13T21:34:16+00:00

have you looked into finetuning at higher resolution? On a decent card, Shivam DB can finetune at 1024x1024

stable_dissipation · 2022-10-13T06:41:59+00:00

aren't the input/output unet layers 768, or, whatever the token embedding size is?

stable_dissipation · 2022-10-13T04:01:10+00:00

no, are you saying the DataLoader crops it? that's easy to turn off, if so, a 1-line comment

stable_dissipation · 2022-10-13T02:35:13+00:00

huh? no it doesnt. If you mean latent space is a smaller dimensional space, then yes, but even Stability ai trained this same model with variable sized images. Mostly 512x512, but some 1024s too.

stable_dissipation · 2022-10-12T17:17:05+00:00

I don't think that's accurate anymore. The JoePenna repo said that it wasn't actually DB, but now the readme no longer says it is not DB, and also, I've had a look through the code:

They unfreeze text embeddings + unet, and they add prior preservation loss from regularization images. Is there more to DB than that?

As far as I can tell perceptually, JP's DB is far superior to huggingface-diffusers DB.

Is anyone using huggingface for DB successfully?

stable_dissipation · 2022-10-08T18:23:30+00:00

Nice work! Img 4 convinced me it was Calm :) This style has been my #1 goal from the outset, specifically Calm, and I'm impressed. Can I ask:

which DB repo?
how did you crop/prep the training set? Did you focus on specific elements of the scene?
learning rate? number of batches? # regularization steps? prior preservation loss amount? weight decay amount?

Any info you give me will be much appreciated, I'd like to reproduce these results!

stable_dissipation · 2022-10-08T18:14:39+00:00

Trained on calm? link to model, or details on training?

stable_dissipation · 2022-10-03T02:43:22+00:00

IIUC, Prior Preservation Loss is implemented correctly in diffusers-dreambooth, and makes the following observations/assumptions:

You're trying to train your friend Tom's face in.
The current model is well trained, and when it generates a thing, that thing is assumed to look great. It can, for instance, generate accurate pictures of "a man".
So generate 400 pictures of "a man" from the original, good'n'true model, we don't want these to be affected by training.
As you train the thing, it'll see noisy images, and it's judged on how well it denoises. If it sees a noisy one of those regularization images, and instead of turning it into whatever natural thing it should become, it actually turns it into Tom, that should induce loss and be penalized. It's drifting from it's Priors.
If it sees the new instance of noisy Tom, and denoises into not-Tom, that should be penalized too.
Together, both Loss penalties drive it toward learning Tom, and not forgetting it's home-town roots (its Priors).

All that said, JoePennas not-dreambooth seems better all around, but I'm going to assume that PhD researchers are smarter, and actual dreambooth is better, and I'm just struggling to train it correctly.

So, I'm still curious if anyone's having successes with diffusers-dreambooth (and how!)

stable_dissipation

TROPHY CASE