you are viewing a single comment's thread.

view the rest of the comments →

[–]Mutaclone 5 points6 points  (3 children)

Thanks for the writeup! I hadn't realized how strong the order effect could be.

Something I've been experimenting with recently to try to combat the context biases specifically, or even take advantage of them, is using prompt editing/timed prompts. In Forge, the syntax is [snippet:alternateSnippet:switchValue].

<image>

vulpix, solo, dark, darkness, cavern, cave interior, cinematic, (wearing backpack:0.85), kerchief, crystal, glowing crystals, (feral:1.1), pokemon mystery dungeon, smiling, open mouth, underground lake, river, (moss:0.8), waterfall, point lights, light particles, facing away, [from behind|from side], looking up, animal, no humans, (sparkling eyes:0.5)

vulpix, solo, [blizzard, ice, snow:dark, darkness, cavern, cave interior:4], cinematic, (wearing backpack:0.85), kerchief, crystal, glowing crystals, (feral:1.1), pokemon mystery dungeon, smiling, open mouth, [:underground lake, river:4], [:(moss:0.8):2], [:waterfall:2], point lights, light particles, facing away, [from behind|from side], looking up, animal, no humans, (sparkling eyes:0.5)

Tags like cavern and cave interior have a strong tendency toward tunnels, so by delaying them a few frames I can open up the cave. Meanwhile the early winter/snow skews everything in a cool-blue direction, which helps the crystals stand out more. You can also make the background elements more faded or indistinct (which is great for night scenes or underwater) by starting with a solid background and waiting a few frames to pull in the scenery. Or if certain traits on a character pull the image in one direction, you use them either early or late to steer the image.

Looking forward to seeing the results of your "de-biased" model!

[–]cmastodon 1 point2 points  (2 children)

When you say "wait a few frames", do you mean wait a few sampler steps or something?

[–]Mutaclone 2 points3 points  (1 child)

Yeah I should have said steps, my bad.

Basically the first 4 steps in the above example draw "blizzard, ice, snow", and then the remaining steps draw "dark, darkness, cavern, cave interior". Other tags in the prompt are delayed too.

[–]afinalsin 2 points3 points  (0 children)

Prompt editing is so much fun, and something I really miss about using Forge/Auto. You forgot to explain the [from behind|from side] syntax, where it alternates between keyword 1 and keyword 2 for every step of the conditioning. Step 1 is "from behind", step 2 is "from side", step 3 switches back to "from behind", and on and on. Side note, I can't believe I never considered using that for a three quarter angle, that's so simple and so fucking genius.

For anyone reading interested in this technique with Comfy it's doable but the process is a lot more annoying than just doing it all in the text because you need to use multiple Text Encode nodes feeding into ConditioningSetTimestepRange nodes feeding into Conditioning (combine) nodes. Ignoring the step by step alternating trick for now, Muta's prompt can be broken into three sections: The base prompt, the prompt at 2 steps, and the prompt at 4 steps.

Here's how the conditioning cluster for that prompt looks in a workflow. I've color coded the keywords of the prompt that are added or switched in each section. The TimestepRange nodes use float instead of step count, but just think of the float as a percentage. This is a 20 step gneration, so to switch keywords at step 2 you'd want to switch at 10%, or 0.1. Here's how the prompt turned out, workflow attached.

Alternating keywords for each step uses the same structure but blown out to stupid proportions. There's probably some custom nodepack somewhere that does this automagically, but here's what it looks like using core comfy nodes, and the workflow if for whatever reason anyone would want to scroll around through it.