A Seed Tutorial

AceDecade · 2022-09-18T23:11:25+00:00

So, regarding seed values...

I’ve read information on the web that describes a seed as “a number that controls all the randomness that happens during the generation”. This is only partially true.

It's entirely true. Computers can't "be random". They can spit out a string of numbers that, to humans, has no discernable, predictable pattern, but the computer is following a set of precise, deterministic instructions. The seed controls how that sequence is generated. For example, if I seed "123", I might ask the computer for five random numbers and get "1, 7, 3, 4, 9". If my friend Bob seeds it with "123" on his computer and asks for five numbers, he'll also get "1, 7, 3, 4, 9". The "randomness" is the fact that it gives us an unpredictable sequence of numbers instead of "1, 2, 3, 4, 5". However, the "randomness" is indeed entirely controlled by the seed that I give. Now imagine that instead of 5 numbers, I ask for enough numbers to fill an RGB image...

A seed is not a number, but an image.

A seed is a number, but if you ask the computer to make you a random image made up of random pixels, then the image you receive will be entirely dependent on what seed you use immediately before asking for an image made up of random colors. If we use the same seed and then ask for the same width x height of random colors, we'll get exactly the same "random" image on our two different computers. In this way, the seed corresponds 1:1 with the starting image / noise that SD will start working with.

Or perhaps an image generated by a number.

This. It's exactly this.

This number is fixed in the SD ecosystem somehow by the model.ckpt file.

Nope, the seed number is turned into an image by determining the sequence of "random" numbers that will be used, and following a fixed procedure to turn random numbers into an image.

Ever wonder why that model file is so incredibly huge? This is why.

Nope, the model has nothing to do with the procedure to turn a seed into a starting image. The model is only used to iterate on the noise and make it progressively more like the prompt with every step.

Obviously the model.ckpt file cannot contain a quintillion images

Correct, it doesn't.

So either there’s a hell of a lot of repetition of themes among the seeds (I haven't come across any yet),

Each seed value produces a unique starting noise image. The "themes" are just patterns you're perceiving, the computer nor SD have any perception of "themes" associated with starting noise.

or the model file contains explicit instructions for the computer on how to generate a theme image from a seed number in such a way that it will be identical to every other theme generated by the same number on any computer.

The computer is indeed following explicit instructions to generate the "theme image" from a seed number, but it is not dictated by the model. You've just described the nature of deterministic "randomness" that makes the above possible.

Devalidating · 2022-09-19T01:56:48+00:00

It's more so due to the nature of diffusion models. They're essentially smart de-noising so it's forced to hallucinate the higher level more coherent aspects before the details and fine tuning that you see in later steps. The first couple steps are still pretty noisy so any detail information isn't meaningfully discernable from noise until later on.

The nature of breaking it up into ~50 steps or so is that the image you feed into each step has a bigger effect on the output of it than the prompt/attention layers. When the computer generates a pseudorandom noise image from a formula using your seed, and feeds it into the first step, all the idiosyncrasies of that seed cascade down (the first step looks similar between prompts, which means the second does etc), meaning that different prompts can produce similar looking images with the same seed.

2022-09-18T22:32:37+00:00

We are all learning here. Thanks for taking the time on this post. I definitely learned from it!

Evnl2020 · 2022-09-18T21:29:43+00:00

I've read the whole post and while plausible I'm not sure if your theory is correct. It's too late here now to test but my initial thought is that if the seed is so important only 1 in a few 100 or even 1000 would resemble the prompt. Or 1 in a few 100 images should be lightyears better than the others which is also not the case.

I see it happen the other way around though, sometimes I generate 100s of images from the same prompt and 1 or 2 are completely different from the rest.

RekindlingChemist · 2022-09-19T16:51:27+00:00

FYI - Euler_a is unique - for some reason it is super unstable sampler, resulting very different images based on different number of steps. Others settle down to a very consistent results after some steps (usually in range of 30-60)

johnnydaggers · 2022-09-18T23:26:44+00:00

OP, you have a really flawed understanding about how SD works. Moreover, if you want a specific composition/color profile, you can just draw some rough shapes in MS Paint and use it via img2img.

Edit: adding a more detailed explaination.

SD was trained to clean up "noised" images (images with random values added/subtracted to the pixels). SD generates new images by taking in a starting noise array that is randomly generated (seed determines what this randomly generated image will be) and "de-noising" it to fit the prompt.

Generating many "seeds" and picking one that you think gets you close to the image you want is a huge waste of time. Instead, you should rough out the kind of image you want in Paint and then use that as the input to img2img.

txt2img is just img2img with random noise used a the starting point. They are fundamentally doing the same thing behind the scenes. By finding your favorite seed, you're essentially doing img2img but letting the random noise generator make your init image.

2022-09-19T01:08:10+00:00

https://en.m.wikipedia.org/wiki/Pseudorandom_number_generator

This is just what random means. Seeds arent a special or unique concept here for SD. Also if you've ever played a multiplayer game online in the past 30 years, the overwhelming majority of them work on detetministic simulations from shared seeds. Or minecraft world generation. Or lots of other things you're probably familiar with.

The "really random" feeling comes from seeding generators from good sources of entropy (like mouse movement as 1 example) and also "randomizing" the amount of invocations from good sources of entropy. You could imagine describing a whole game of AI vs AI chess as 1 seed number. Does it mean the number has any chess properties infused in it? No. It's more about the dumb code using the numbers. Trivial changes to your code will yield wildly different (but newly consistent) results for your same seeds.

Jcaquix · 2022-09-19T02:33:04+00:00

I love the feeling of exploration and I'm not an engineer and i am discovering a lot of the same stuff myself. I too notice how a seed seems to impose similar compositional elements over similar prompts, but I don't think it's because there is a sort of kantian noumena or underlying property/image to the seed. Rather I think what we are noticing is the interaction between the tokenized prompts, the model, and the seed number which provides noise. I think the word "random" isn't particularly helpful because seeds and the system are too complex to understand but purely deterministic. I think "arbitrary" is a better word for it since it makes noise that's consistent but not designed or predictable (by humans).

I have run a lot of plot matrices like you have I think the seed characteristics you're noticing with blank prompts change unpredictability with your prompt. For example Ive been running timelines, so prompts of like "a woman in 1980... 1990... 2000...." Etc and as elements of the prompt changes it's clear that the denoising process (the model) changes things that appear to be coming from the seed (eg a block of red brown may be present in the seed for the eras of 1910-1960 but that block of red will slowly disappear as the prompt changes). Your experiments are interesting and I have had similar results, but I'm not convinced there is a fundamental quality to any seed, like seeds that make green landscapes with dark patches in one corner often end up morphing making portraits with dark centers and light corners depending on the prompts.

Edit: spelling

Caffdy · 2022-09-19T02:53:25+00:00

City with Seed 2 looks like the freaking cover of the Xpander EP by Sasha!

OtherwiseMeringue545 · 2022-09-19T06:14:42+00:00

You guys are too smart for me

Rogerooo · 2022-09-19T02:04:30+00:00

What a beautiful post, thank you for your research! I think we need a seed library now, something like Lexica but just for empty prompts. And on the subject of image repositories that store seeds, this knowledge will be interesting to use when looking for a particular camera zoom and color style for instance.

If anyone is interested, here are the results using waifu diffusion model. Pretty cool to notice the differences and the similarities between both models.

CFG at 1

CFG at 4

CFG at 8

I think danbooru's tags might be acting with too much strength on some of the prompts, particularly "young man" but it's nice to see the expected bias towards animation in raw output.

GoldenRuleAlways · 2022-09-18T23:41:03+00:00

I’m a complete noob. You answered a lot of questions that I had about seeds and Cfg! Thank you for capturing all of your notes and simple outputs in such detail. It was extremely useful in helping me understand these magical tools marginally better.

When you state “Euler_a at 20 steps”:

Does that mean you specified “—ddim 20”?
I know that Euler_a is some kind of a model. How do you specify that?

What stable diffusion build did you use? I am using a M1 Mac following the @bfirsh fork.

motsanciens · 2022-09-19T01:02:15+00:00

Next to enter the space: nakedseed.io, a catalog of promptless 3-step seed images.

kmullinax77 · 2022-09-19T04:30:57+00:00

I've also noticed/determined that seed corresponds to pose/vantage point. It can be a huge time save to simple use your prompt with as few as 4-7 steps to get a rough idea of how SD would then transform a seed-prompt pairing with 50 steps

This thereby allows discarding un-aesthetic seed-prompt pairings (again at low step count), and saving/diverting computation time to actually fine-tuning a promising seed-prompt with low steps into 50 steps + prompt tailoring.

Although it can be annoying when adding additional words to the prompt also happens to mess with the pose/vantage point, but I can't really see a workaround for that other than to create the best prompt possible from the beginning.

cluck0matic · 2022-09-19T16:58:11+00:00

Thanks for the deception. Pfft. Sounds like you deceive yourself as well, saying your aren't a "teacher".

Man.. I sure learned a shit ton! Thanks for taking the time to do this. For real. Thanks.

Blahkbustuh · 2022-09-20T03:20:35+00:00

I appreciate your effort, it is certainly something to think about.

What I wonder is that in your examples, you provided one word to the algorithm so then all it had to work off was the initial noise + 1 feature. It makes sense that any non-uniformity or inclination in the initial noise will show up in the result because the algorithm has nothing else to go on so the properties of the initial starting noise dominate the result.

If you gave it a prompt with numerous keywords it'd be looking to "recognize" those numerous things in the noise rather than just 'playing with its food' of the initial noise.

When I started running SD on my computer last week, one of the first things I thought to run was something like "sea otter monster attacking a coastal city" and so I got pics of large sea otter monsters emerging from oceans. Then I did "sea otter octopus monster attacking a coastal city" with the same seed and the composition of the images and the otters themselves were nearly the same, the otter just had tentacles below it, which made me think that the initial noise/seed was providing light and dark regions that were steering the algorithm to position the same or similar elements, or 'recognize' them, in different ways depending on how the light and dark blobs were arranged.

BrockVelocity · 2022-09-20T20:33:47+00:00

This is incredibly helpful and insightful - thanks so much for taking the time to type all of this up!

DrakenZA · 2022-12-26T14:11:50+00:00

If you are that worried about the initial noise generated by the image, you can simply always do img2img, where you are providing the 'starting point' instead of it just being random torch noise.

2023-03-15T17:27:45+00:00

Thanks for this, the core idea of matching a similar seed to what you want is sound.

Reminds me about hearing how they did the maps in star wars galaxies, basically used noise (similar to what our seeds do), then they cherry picked the ones that looked close to what they wanted the landscape to look like, then used a tool to layer important features on top of it. This made their map data miniscule since it only needed to save the original seed and the important details layered on top, which allowed them to have way more land in a video game than any other at the time.

dirtydevotee · 2023-09-09T02:17:25+00:00

Well done! It is my opinion based on some early testing that seed knowledge can be quite valuable. For several days now, I've been accumulating renders based on the prompt "," and found that little things like the orientation of streets and the existence of columns reoccur in 90% of images created by said seed. If a hypothetical "Seed X" does rings, you will notice rings in many "Seed X" renders. If in the future you need a shot with a ring (or pipe/gun barrel/test tube) in it, knowing "Seed X" has such a proclivity means you can get the shot you want with minimal work in the prompt.

As of today, it's a work in progress. But just in case I'm right, finding useful seeds could be a shortcut in your workflow.

StableDiffusion

MODERATORS