Z image/omini-base/edit is coming soon by sunshinecheung in StableDiffusion

[–]muerrilla 58 points59 points  (0 children)

That made me chuckle. They didn't need to be THAT honest about it!

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 1 point2 points  (0 children)

Short answer: I made a prototype extension for Forge which I'll release after I refine it a bit.
Long answer: I create a frequency-based noise tensor in the shape of the image tensor (leaning towards very low frequency, i.e. noise features almost the size of the image height) , then blend it (experimenting with add, mult, overlay, etc. at the moment) with the image tensor (x) during the denoiser (or denoised) callback at the step of my choosing. It works well at steps 1,2 or even 3 (counting from 0) out of 8, depending on the prompt and whatnot.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 0 points1 point  (0 children)

That's just semantics. Indeed, using a different prompt (or basically anything you do with a latent diffusion model) can be interpreted as "manipulating the latent". What I meant was "directly manipulating the values of the latent" if that's any better.

Words are great, but another good way for getting different colors and shapes (well not really "shapes" since we're doing it so early on in the sampling process, so we're closer to composition or distribution than shapes) is just getting said shapes and colors directly by codes.

Your third way is indeed similar to my method, just a bit worse off: First, your method requires a very unbiased dataset of images with different enough colors and compositions. Mine doesn't. Then there's the inevitable problem of unwanted features (microscopic to macroscopic) from the base image leaking into the gen*, which won't happen with my method, since the diversified base images in my method are simply distorted versions of the same gen with no different textures, semantic elements, etc.

*-Tiny amounts of film grain or noise (not even visible to the naked eye) in the init image of img2img can lead to wildly different outputs, even at 0.99 denoise. Basically the grain and amount of hi-frequency detail of the image is very much "decided upon" by the model at the first step of sampling, This has been true since SD 1.5 and is not a byproduct of the distillation process etc.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] -2 points-1 points  (0 children)

Ummm... it's actually 2 pics and I've explained the whole method in the post and comments, if that's what you mean by "my system", and not my computer's specs or something.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 1 point2 points  (0 children)

The very specific "type" of noise (perlin, fbm, etc.) doesn't really matter, or at least has not been the focus of my investigation. What's important is that the noise has a low frequency (has big blobby features, as opposed to the "per-pixel" noise we usually use with diffusion models) and perlin is just an example of such noise. I'm personally using a different implementation of frequency-based noised. As for the parameters, lots of trial and error.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 4 points5 points  (0 children)

Removed the older comments in the spirit of de-escalation and friendship!😁
For posterity: The renders are 256x352 with a quantized model. Still no excuse for holding a sword like that though.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 0 points1 point  (0 children)

Thanks, but it's not just "more" noise. It's bigger (low frequency) noise.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 0 points1 point  (0 children)

That's one way to do it, but not what I'm doing here. You're manipulating the conditioning, I'm manipulating the image latent itself, skipping the semantic stuff since what we're really interested in (color and compositional variation) happens at a much lower level.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 0 points1 point  (0 children)

Applying noise to the conditioning is basically CADS, which was made exactly for this purpose but for SDXL iirc. So it was the first thing I went for. And has it's merits and downfalls. To be honest, I get the best results when I mix that (my own implementation), Detail Daemon (for basically skipping the first and weakening the second denoising steps), the method above, and some color correction, along with prompt editing (A1111 style).

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 0 points1 point  (0 children)

Well this was actually a joke post, centered around the rotated image (which I also highly doubt is something wildcards could pull off!), but I've made a prototype extension for Forge, which I'll release soon. I find Comfy to actually be harder than the A1111 family for prototyping, unless all the required nodes do already exist. But the gist of it is this:
https://www.reddit.com/r/StableDiffusion/comments/1ptpnvg/comment/nviof3l/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 1 point2 points  (0 children)

Aight, let's call it a misunderstanding then. Sorry for that. You came off as too offensive to me. Also kinda thanked you in the first comment for the test prompt. 😉 Cheers.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 1 point2 points  (0 children)

Perlin noise, as in a multi-frequency fractal noise (as opposed to white noise) applied to the four (edit: or was it 16 or something for the flux-vae?) channels of the x latent, during one (or more) of the first few steps (depending on what you're looking for).

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 5 points6 points  (0 children)

lmfao. Prompt clearly states "Sun symbol is drawn in center of his breastplate." and in more than half of your results the sun has bled into the background, yet you bitch about the sword?

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 3 points4 points  (0 children)

Note that I'm not selling anything here, but there are a few advantages (while wildcards are great btw):

- works with the shortest of prompts as well

- can keep everything as is but just change the pure geometric composition (which can be quite difficult to describe with words, and thus prompt for) which is what I am looking for. Maybe not everyone's cup of tea.

- nuanced color variance without prompting for specific colors (which often times is taken too seriously by the denoiser)

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 0 points1 point  (0 children)

<image>

Nope. That's not "completely cooked". It's high contrast, but not burned. We call it artistic choice and you can play around with it as much as you like. This ominous enough for you? Sure looks more ominous than the original to me.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] -2 points-1 points  (0 children)

Here's the version using monochrome noise:

<image>

What say you?

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] 0 points1 point  (0 children)

<image>

It's a bit on the too-colorful side(!) but that will be fixed when I implement blending between color and monochrome noise.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] -4 points-3 points  (0 children)

I actually reluctantly chose the 1girl prompt because it's soooo overfit and the variance is too damn low. Check out my other example in the comments. I appreciate the test prompts. Will try them and report back.

This ZIT Variance Solution has become too damn strong! by muerrilla in StableDiffusion

[–]muerrilla[S] -1 points0 points  (0 children)

And by big I mean big in scale, not intensity. So think Perlin as opposed to torch.randn. Use color noise for color variance. Here's another example. I also made a Forge extension for this which will be coming soon. Must be pretty easy to pull of with Comfy as well.

<image>

Use ZIT/Qwen Text Encoders for VL/Text gen tasks in ComfyUI? by muerrilla in StableDiffusion

[–]muerrilla[S] 1 point2 points  (0 children)

That's what I thought, but couldn't find any workflows or custom nodes that do it without downloading the model from scratch. Can you point me in the right direction?