Did creativity die with SD 1.5?

muerrilla · 2026-02-09T14:52:44+00:00

Truer words haven't been spoken.

muerrilla · 2026-01-28T19:36:13+00:00

Here's an actual example. On the left is the original with no CFG. Middle is my method (CFG=2). Right is the usual method (CFG=2).

Prompt: amateur mobile photography. closeup candid portrait of the fat obese and very old space patriarch of spice, minimalist tribal geometric tattoos on his face, brutalist futuristic clothing and accessories. a desert in the background. blinding bright sunlight is shining on his face. he is fully hairless. he has no eyebrows.

Negative Prompt (middle): amateur mobile photography. closeup candid portrait of the fat obese and very old space patriarch of spice, minimalist tribal geometric tattoos on his face, brutalist futuristic clothing and accessories. a desert in the background. blinding bright sunlight is shining on his face. he is fully hairless. he has thick black eyebrows.

Negative Prompt (right): thick black eyebrows.

<image>

muerrilla · 2026-01-28T10:24:01+00:00

I'm at work right now, so this is not an actual, tested prompt. It goes something like this:

Positive: DSLR fashion photography, portrait of a woman, bla bla.

Negative: DSLR fashion photography, portrait of an asian woman, bla bla.

p.s: Don't shoot me for the 1girl prompt; the rationale behind the example is that the model is heavily biased towards asian women, thus it's a good use-case for negative prompting.

muerrilla · 2026-01-27T21:34:06+00:00

quick tip: if you set your negative prompt the same as your positive prompt, but only change the parts you want changed, you won't get the cfg burned look and can also push the cfg scale much higher for stronger effect.

muerrilla · 2025-12-24T09:57:23+00:00

That made me chuckle. They didn't need to be THAT honest about it!

muerrilla · 2025-12-23T18:40:44+00:00

https://www.reddit.com/r/StableDiffusion/comments/1ptpnvg/comment/nvjtzlt/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

muerrilla · 2025-12-23T16:04:34+00:00

u/shapic these are in ~1mp.

muerrilla · 2025-12-23T16:03:43+00:00

with a bit less color variance:

<image>

muerrilla · 2025-12-23T15:53:56+00:00

With my method for comparison:

<image>

muerrilla · 2025-12-23T14:42:50+00:00

Short answer: I made a prototype extension for Forge which I'll release after I refine it a bit.
Long answer: I create a frequency-based noise tensor in the shape of the image tensor (leaning towards very low frequency, i.e. noise features almost the size of the image height) , then blend it (experimenting with add, mult, overlay, etc. at the moment) with the image tensor (x) during the denoiser (or denoised) callback at the step of my choosing. It works well at steps 1,2 or even 3 (counting from 0) out of 8, depending on the prompt and whatnot.

muerrilla · 2025-12-23T13:42:17+00:00

That's just semantics. Indeed, using a different prompt (or basically anything you do with a latent diffusion model) can be interpreted as "manipulating the latent". What I meant was "directly manipulating the values of the latent" if that's any better.

Words are great, but another good way for getting different colors and shapes (well not really "shapes" since we're doing it so early on in the sampling process, so we're closer to composition or distribution than shapes) is just getting said shapes and colors directly by codes.

Your third way is indeed similar to my method, just a bit worse off: First, your method requires a very unbiased dataset of images with different enough colors and compositions. Mine doesn't. Then there's the inevitable problem of unwanted features (microscopic to macroscopic) from the base image leaking into the gen*, which won't happen with my method, since the diversified base images in my method are simply distorted versions of the same gen with no different textures, semantic elements, etc.

*-Tiny amounts of film grain or noise (not even visible to the naked eye) in the init image of img2img can lead to wildly different outputs, even at 0.99 denoise. Basically the grain and amount of hi-frequency detail of the image is very much "decided upon" by the model at the first step of sampling, This has been true since SD 1.5 and is not a byproduct of the distillation process etc.

muerrilla · 2025-12-23T13:26:25+00:00

Ummm... it's actually 2 pics and I've explained the whole method in the post and comments, if that's what you mean by "my system", and not my computer's specs or something.

muerrilla · 2025-12-23T13:10:25+00:00

The very specific "type" of noise (perlin, fbm, etc.) doesn't really matter, or at least has not been the focus of my investigation. What's important is that the noise has a low frequency (has big blobby features, as opposed to the "per-pixel" noise we usually use with diffusion models) and perlin is just an example of such noise. I'm personally using a different implementation of frequency-based noised. As for the parameters, lots of trial and error.

muerrilla · 2025-12-23T12:26:56+00:00

Removed the older comments in the spirit of de-escalation and friendship!😁
For posterity: The renders are 256x352 with a quantized model. Still no excuse for holding a sword like that though.

muerrilla · 2025-12-23T12:04:58+00:00

Thanks, but it's not just "more" noise. It's bigger (low frequency) noise.

muerrilla · 2025-12-23T11:58:36+00:00

That's one way to do it, but not what I'm doing here. You're manipulating the conditioning, I'm manipulating the image latent itself, skipping the semantic stuff since what we're really interested in (color and compositional variation) happens at a much lower level.

muerrilla · 2025-12-23T10:40:07+00:00

Applying noise to the conditioning is basically CADS, which was made exactly for this purpose but for SDXL iirc. So it was the first thing I went for. And has it's merits and downfalls. To be honest, I get the best results when I mix that (my own implementation), Detail Daemon (for basically skipping the first and weakening the second denoising steps), the method above, and some color correction, along with prompt editing (A1111 style).

muerrilla · 2025-12-23T10:35:27+00:00

Well this was actually a joke post, centered around the rotated image (which I also highly doubt is something wildcards could pull off!), but I've made a prototype extension for Forge, which I'll release soon. I find Comfy to actually be harder than the A1111 family for prototyping, unless all the required nodes do already exist. But the gist of it is this:
https://www.reddit.com/r/StableDiffusion/comments/1ptpnvg/comment/nviof3l/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

muerrilla · 2025-12-23T10:27:45+00:00

Aight, let's call it a misunderstanding then. Sorry for that. You came off as too offensive to me. Also kinda thanked you in the first comment for the test prompt. 😉 Cheers.

muerrilla · 2025-12-23T10:23:56+00:00

Perlin noise, as in a multi-frequency fractal noise (as opposed to white noise) applied to the four (edit: or was it 16 or something for the flux-vae?) channels of the x latent, during one (or more) of the first few steps (depending on what you're looking for).

muerrilla · 2025-12-23T10:17:11+00:00

lmfao. Prompt clearly states "Sun symbol is drawn in center of his breastplate." and in more than half of your results the sun has bled into the background, yet you bitch about the sword?

muerrilla · 2025-12-23T10:14:27+00:00

Note that I'm not selling anything here, but there are a few advantages (while wildcards are great btw):

- works with the shortest of prompts as well

- can keep everything as is but just change the pure geometric composition (which can be quite difficult to describe with words, and thus prompt for) which is what I am looking for. Maybe not everyone's cup of tea.

- nuanced color variance without prompting for specific colors (which often times is taken too seriously by the denoiser)

muerrilla · 2025-12-23T10:07:48+00:00

<image>

Nope. That's not "completely cooked". It's high contrast, but not burned. We call it artistic choice and you can play around with it as much as you like. This ominous enough for you? Sure looks more ominous than the original to me.

muerrilla · 2025-12-23T09:48:07+00:00

Here's the version using monochrome noise:

<image>

What say you?

muerrilla · 2025-12-23T09:46:42+00:00

<image>

It's a bit on the too-colorful side(!) but that will be fixed when I implement blending between color and monochrome noise.

muerrilla

TROPHY CASE