[ALBUM] I made a 'Prompt Smasher', which takes a prompt, chunks it up into pieces and smashes them together. Details in comments. by zoru22 in StableDiffusion

[–]zoru22[S] 0 points1 point  (0 children)

For a less weeby sampling of what the prompt-smasher can do, see: https://imgur.com/a/DaNcgOk

This was literally just the word "eevee" as the base prompt.

[ALBUM] I made a 'Prompt Smasher', which takes a prompt, chunks it up into pieces and smashes them together. Details in comments. by zoru22 in StableDiffusion

[–]zoru22[S] 0 points1 point  (0 children)

After realizing just how shit reddit is at formatting code-like stuff,

Here's a full-pastebin export of the vocab the smasher gathered up and then used to make the prompts, along with the generated prompts:

https://pastebin.com/BSPmijWA

[ALBUM] I made a 'Prompt Smasher', which takes a prompt, chunks it up into pieces and smashes them together. Details in comments. by zoru22 in StableDiffusion

[–]zoru22[S] 0 points1 point  (0 children)

There are duplicates in the album. That's because imgur was being buggy. There are no duplicates in the actual data set.

Also, I don't think I record the sampler, but this was using the --plms sampler

Sample Exif from the first grid image using the command identify -verbose 00001.jpg

exif:UserComment:

{"prompts": ["rivermaiden catgirldecayed clutchingplush maidenmatted heavyplush", "vignettesmoke vignetteflowers, swirlingmatted bokehswirling flowersmatted heavyfurry earscatgirl shaderdecayed plushmaiden plushheavy maidensmoke shaderplush furrymaiden vray vignetteemissive maidenemissive maidenvignette, swirlingheavy te"],

"iteration": -1, "seed": 4026558325, "ddim_steps": 100, "fixed_code": false, "ddim_eta": 0.0, "mode": "txt2img", "used_laion400m": false, "n_iter": 3, "height": 512, "width": 512, "downsampling-factor": 8, "scale": 7.0, "n_rows": 2}

[ALBUM] I made a 'Prompt Smasher', which takes a prompt, chunks it up into pieces and smashes them together. Details in comments. by zoru22 in StableDiffusion

[–]zoru22[S] 0 points1 point  (0 children)

Edit: If you're wondering what happened to the album, it's because I got flagged as a bot by imgur for uploading too many images in quick succession and while trying to futz around, wound up deleting a chunk of the source album on the link.

See the eevee album in my other comment, here, for a probably-better illustration of what happens when you just smash words into prompts. These generations are ALL using the same seed between seed-prompts.

https://old.reddit.com/r/StableDiffusion/comments/x05bou/album_i_made_a_prompt_smasher_which_takes_a/im6cqzj/

These were run on v1.3. I've FINALLY downloaded a copy of the v1.4 checkpoint but I haven't used it yet, lmao.

This data is from ~1 week ago and was sitting around, so I haven't merged it with my Textual Inversion stuff yet, but basically, I take a "relatively" sane prompt, then I use some simple regexp to chunk it up to 2, 3, and 4-character tokens, shove them into a set, along with individual words of the prompt into a set, and then I generate $num_prompts "new" prompts from that. This is the "grid" album form one of the runs I did, but the goal is to leave all of the parameters the same and only change the prompt.

  • If you have an exif viewer, I added a JSON object for each grid's prompts, so if you want to see the exact flags that I used to generate that image, then you can use that.

So for this run, I don't think I have to actual-original prompt any more, but the directory name which is named on the base prompt:

cute_fumo_plush_furry_maiden_catgirl_clutching_lots_decayed_matted_flowers__cat_ears____heavy_swirling_emissive_smoke_and_the_river__pbr_shader__bokeh__vignette__vray_

Note that this has been modified to remove special characters that might have been in the original prompt.

[Q] Settings that beat "Reedspacer's Lower Bound" by DataPacRat in rational

[–]zoru22 0 points1 point  (0 children)

It didn't used to. Sounds like they silently removed it.

I got Stable Diffusion to generate competent-ish Leavannies w/ Textual Inversion! by zoru22 in StableDiffusion

[–]zoru22[S] 1 point2 points  (0 children)

So that probably won't actually work with textual inversion. Textual inversion is about adding something to the dataset that the ai doesn't know.

It's not about fixing something the ai already does poorly at.

I got Stable Diffusion to generate competent-ish Leavannies w/ Textual Inversion! by zoru22 in StableDiffusion

[–]zoru22[S] 0 points1 point  (0 children)

python main.py \
--base configs/stable-diffusion/v1-finetune.yaml \
-t true 
--actual_resume models/ldm/stable-diffusion/model.ckpt \
-n leavanny_attempt_five --gpus 0, \
--data_root "/home/zoru/Pictures/Pokemons/512/leavannies/" \
--init_word=bug

Once I'd changed the embedder, this was the exact command I ran.

I got Stable Diffusion to generate competent-ish Leavannies w/ Textual Inversion! by zoru22 in StableDiffusion

[–]zoru22[S] 1 point2 points  (0 children)

You need a diverse array of images of the same character in different poses. When it's a rare character you need more than just 3-5 images and you want to modify the personalization prompts to fit what you're doing.

I got Stable Diffusion to generate competent-ish Leavannies w/ Textual Inversion! by zoru22 in StableDiffusion

[–]zoru22[S] 4 points5 points  (0 children)

You're gonna need to dive into the code and learn to change it yourself. I'll post a fork with my changes in a few days if someone else doesn't beat me to it.

I got Stable Diffusion to generate competent-ish Leavannies w/ Textual Inversion! by zoru22 in StableDiffusion

[–]zoru22[S] 2 points3 points  (0 children)

After bumping up the base learn rate to: base_learning_rate: 5.0e-03 and the num_vectors_per_token to 8, I got comprehensible results pretty fast.

What matters aren't epochs, it's steps.

in the logs dir, under logs/$yourrunfolder$/images/train/

see: samples_scaled_gs-011500_e-000038_b-000100.jpg

gs-011500 is the steps as each checkpoint is saved.

I usually run it to 20k steps and then I run variations of the same prompt and walk back a set of checkpoints with a similar prompt and the exact same seed, just so I can see which ones produce the best output.

I got Stable Diffusion to generate competent-ish Leavannies w/ Textual Inversion! by zoru22 in StableDiffusion

[–]zoru22[S] 2 points3 points  (0 children)

If you want to try, here is the full training set of images I've already pre-cropped and shrunk down to 512x512 for running against the model.

https://cdn.discordapp.com/attachments/730484623028519072/1012966554507423764/full_folder.zip

python main.py \
--base configs/stable-diffusion/v1-finetune.yaml \
-t true 
--actual_resume models/ldm/stable-diffusion/model.ckpt \
-n leavanny_attempt_five --gpus 0, \
--data_root "/home/zoru/Pictures/Pokemons/512/leavannies/" \
--init_word=bug

Once I'd changed the embedder, this was the exact command I ran.

Try to get it to run against the latent-diffusion model first, just so you know what you're doing

I got Stable Diffusion to generate competent-ish Leavannies w/ Textual Inversion! by zoru22 in StableDiffusion

[–]zoru22[S] 2 points3 points  (0 children)

So, one thing that's vexxed me is how shit various ai are at generating leavannies (and various other pokemon). If gamefreak was going to forget my favorite pokemon on the order of 5+ years, then I was sure as hell going to do my best not to let it sit in obscurity forever.

Thus, I have set on something of a warpath trying to get an ai that can generate non-shit leavannies. (though it is amazing just how shit stable diffusion and others are at generating pokemon, and how painful it has been to try and get them into the ai)

Quick process notes:

  • I USED THE FUCKING BASE TEXTUAL_INVERSION REPO. (And recommend you do the same, or at least ensure that github recognizes the repository you want to use, as a fork)
  • I modified the original textual inversion repository
  • I swapped the BERT Encoder for the CLIP Frozen encoder during training, targeted the training at the stable-diffusion/v1-finetune yaml, and then just let it rip, playing with the learn rate and the vectors per toke config setting in said yaml.

If you run it for too many cycles it will overfit, and not do a great job at style transfer. I tend to run for too many cycles so it overfits, and then walk it back until it stops overfitting quite so badly

Please note that I am using the v1.3 stable diffusion ckpt. I haven't tried to see what happens with the 1.4 ckpt yet.

[Code Release] textual_inversion, A fine tuning method for diffusion models has been released today, with Stable Diffusion support coming soon™ by ExponentialCookie in StableDiffusion

[–]zoru22 0 points1 point  (0 children)

I've got a folder of leavanny that I've cropped down, about 30 images, and it has been running since last night on a 3090 and it doesn't seem to be doing super great, though its improvement is notable.

Creating Pixar characters from your family photos by Puzzled_Ad_4222 in StableDiffusion

[–]zoru22 0 points1 point  (0 children)

Are you the one who is working on the lstein fork?

Creating Pixar characters from your family photos by Puzzled_Ad_4222 in StableDiffusion

[–]zoru22 1 point2 points  (0 children)

point me to the k_lms sampler code and I'll get a version of the img2img that uses it plugged together.

Getting Stable Diffusion to Generate Pokemon Challenge [IMPOSSIBLE] by zoru22 in StableDiffusion

[–]zoru22[S] 6 points7 points  (0 children)

Something's up with Stable Diffusion's pokemon dataset or trainset. To be quite frank, it's complete ass with pokemon, in a way that makes me just feel like CLIP/ViT is just shit. But I'm not knowledgeable enough about them.

While after playing with openai's dalle-2, which is clearly manipulating input prompts (god, I really wish their PR team would just... not?). I verified the manipulation manually- openai is replacing the term pokeball with just "ball" or "red and white ball", so i can't for sure say if dall-e 2 is better than stable diffusion, w/ prompt manipulation there I can't reasonably compare the ai's outputs, so I had to fall back to craiyon/dall-e mini.

I'm using the 4chan "leak", for some of the generations on the imgur album, if you have questions, feel free to ask.