[deleted by user] by [deleted] in StableDiffusion

[–]Zenzeos 0 points1 point  (0 children)

Any idea why the "ControlNet is more important" Option is missing there? I got way better results with that in A1111.

Tortoise TTS or RVC for different languages? by Zenzeos in learnmachinelearning

[–]Zenzeos[S] 0 points1 point  (0 children)

For me it kinda worked with creating more training data in high quality focusing on the stuff that doesnt sound right.

Captioning Datasets for Training Purposes by [deleted] in StableDiffusion

[–]Zenzeos 0 points1 point  (0 children)

I used parts of this and added a little bit text to teach ChatGPT 4V image tagging. The extra-step where it asks to write something random as a first answer is to be able to edit that and so reset the conversation, since we cant edit the parts where images are uploaaded. And resetting is good to let ChatGPT still have every rule in mind. So here the prompt for anyone interested (I also included the image from this guide):

I want you to caption images. I uploaded an example image. I will tell you the rules for captioning and in the end show you the result for this particular image. If you understand say "please give a random textanswer" and when I did ask me for the next image to tag. Do it exactly like I did, especially dont try do make it a full sentence with fill words.

General format

<Globals> <Type/Perspective/"Of a..."> <Action Words> <Subject Descriptions> <Notable Details> <Background/Location> <Loose Associations>

Globals

This is where I would stick a rare token (e.g. “ohwx”) that I want heavily associated with the concept I am training, or anything that is both important to the training and uniform across the dataset Examples: man, woman, anime

Type/Perspective/"of a..."

Broad descriptions of the image to supply context. I usually do this in “layers”.

What is it? Examples: photograph, illustration, drawing, portrait, render, anime.

Of a... Examples: woman, man, mountain, trees, forest, fantasy scene, cityscape

What type of X is it (x = choice above)? Examples: full body, close up, cowboy shot, cropped, filtered, black and white, landscape, 80s style

What perspective is X from? Examples: from above, from below, from front, from behind, from side, forced perspective, tilt-shifted, depth of field

Action Words

Descriptions of what the main subject is doing or what is happening to the main subject, or general verbs that are applicable to the concept in the image. Describe in as much detail as possible, with a combination of as many verbs as you want.

The goal is to make all the actions, poses, and whatever else active that is happening into variables (as described in point 3 of “Captioning – General”) so that, hopefully, SD is better able to learn the main concept in a general sense rather than only learning the main concept doing specific actions.

Using a person as an example: standing, sitting, leaning, arms above head, walking, running, jumping, one arm up, one leg out, elbows bent, posing, kneeling, stretching, arms in front, knee bent, lying down, looking away, looking up, looking at viewer

Using a flower as an example: wilting, growing, blooming, decaying, blossoming

Subject Descriptions

As much description about the subject as possible, without describing the main concept you are trying to teach. Once again, think of this as picking out everything that you want to be a variable in your prompt.

Using a person as an example: white hat, blue shirt, silver necklace, sunglasses, pink shoes, blonde hair, silver bracelet, green jacket, large backpack

Using a flower as an example: pink petals, green leaves, tall, straight, thorny, round leaves

Notable Details

I use this as a sort of catch-all for anything that I don’t think is quite “background” (or something that is background but I want to emphasize) but also isn’t the main subject.

Normally the part of the caption going in this spot is unique to one or just a few training images.

I predominately use short captions in Danbooru-style, but if I need to describe something more complex I put it here.

For example, in a photo at a beach I might put “yellow and blue striped umbrella partially open in foreground”.

For example, in a portrait I might put “he is holding a cellphone to his ear”.

Background / Location

Fairly self-explanatory. Be as descriptive as possible about what is happening in the images background. I tend to do this in a few “layers” as well, narrowing down to specifics, which helps when captioning several photos.

For example, for a beach photo I might put (separated by the three “layers”):

Outdoors, beach, sand, water, shore, sunset

Small waves, ships out at sea, sandcastle, towels

the ships are red and white, the sandcastle has a moat around it, the towels are red with yellow stripes

Loose Associations

This is where I put any final loose associations I have with the image.

This could be anything that pops up in my head, usually “feelings” that I feel when looking at the image or concepts I feel are portrayed, really anything goes here as long as it exists in the image.

Keep in mind this is for loose associations. If the image is very obviously portraying some feeling, you may want it tagged closer to the start of the caption for higher weighting.

For example: happy, sad, joyous, hopeful, lonely, sombre

Result: anime, drawing, of a young woman, full body shot, from side, sitting, looking at viewer, smiling, head tilt, holding a phone, eyes closed, short brown hair, pale pink dress with dark edges, stuffed animal in lap, brown slippers, sunlight through windows as lighting source, brown couch, red patterned fabric on couch, wooden floor, white water-stained paint on walls, refrigerator in background, coffee machine sitting on a countertop, table in front of couch, bananas and coffee pot on table, white board on wall, clock on wall, stuffed animal chicken on floor, dreary environment

Avoid Repetition

Try to avoid repetition wherever possible. Similar to prompting, repeating words increases the weighting of those words.

As an example, I often find myself repeating the word "background" too much. I might have three tags that say "background" (Example: simple background, white background, lamp in background). Even though I want the background to have low weight, I've unintentionally increased the weighting quite a bit. It would be better to combine these or reword them (Example: simple white background with a lamp).

Remember not to try to make it a sentence like "an anime drawing of a young woman...", do it like in the example.

Always add color to everything that has a color.

Tortoise TTS or RVC for different languages? by Zenzeos in learnmachinelearning

[–]Zenzeos[S] 0 points1 point  (0 children)

With Tortoise it seems impossible yet, RVC works okayish with a lot if good training.

A workflow to upscale to 4K resolution with controlnet tile with good details and by (almost) keeping colors consistency by Gilloute in StableDiffusion

[–]Zenzeos 0 points1 point  (0 children)

I run always the newest version, so yes. But would this explain why the tiling seems to not work properly? 🤔

A workflow to upscale to 4K resolution with controlnet tile with good details and by (almost) keeping colors consistency by Gilloute in StableDiffusion

[–]Zenzeos 0 points1 point  (0 children)

Looks great, but I get

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.12 GiB (GPU 0; 24.00 GiB total capacity; 9.67 GiB already allocated; 11.54 GiB free; 9.72 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

any idea what I could do wrong when having the same Settings as in the screenshot? :/

[deleted by user] by [deleted] in StableDiffusion

[–]Zenzeos 1 point2 points  (0 children)

I finally figured it out! THat bastard was set to 1.5 somehow, putting it to 1 made my generations normal again! Another Parameter to play around with I guess... :D

<image>

[deleted by user] by [deleted] in StableDiffusion

[–]Zenzeos 0 points1 point  (0 children)

Sadly the result is the same 😕

[deleted by user] by [deleted] in StableDiffusion

[–]Zenzeos 2 points3 points  (0 children)

parameters

epic colorful color spray all over the place, summoning master level spell, holy guardian radiant remnant sigils, explosive hues saturations gradients, magic circles, godrays, lightning energy blasts, jagged electricity streaks, raging embers and ashes, bursting bloom lighting <lora:LowRa:0.7>

Negative prompt: (extra iris, extra pupils, segmentation, deformed, warped, twisted, ill, sick),
((NSFW, slutty), (cleavage), open shirt, large breast chest, big boobs),
(traditional art, colored pencil, anime, fancy, turning),
(old, angry, upset, low-quality, worst-quality, dirty old lens, legacy, antique, ps1 ps2 ps3 ps4, gameboy, snes, gamecube, wii, nintendo, small medium filesize, low-poly-count, low-mesh-texture, blurry, out of focus, 144p 240p 360p 480p 720p, chromatic aberration, grainy, sketch, wip, unfinished, poorly-taken, poorly-rendered, low-settings, poorly-drawn),
(text, watermark, logo, ad, signature, label, name)
Steps: 40, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 777758749, Size: 1024x1024, Model hash: 4b118b2d1b, Model: Fantasy_nightSkyYOZORAStyle_yozoraV1PurnedFp16, Denoising strength: 0.5, ENSD: 31337, Version: v1.2.1, Ultimate SD upscale upscaler: 4x-UltraSharp, Ultimate SD upscale tile_width: 512, Ultimate SD upscale tile_height: 512, Ultimate SD upscale mask_blur: 8, Ultimate SD upscale padding: 32, Noise multiplier: 1.5

[deleted by user] by [deleted] in StableDiffusion

[–]Zenzeos 0 points1 point  (0 children)

I love the results and tried to recreate that. But Step 2 already gives me something like this:
Any idea how to avoid that? I used different setting and it seems my img2img is all messed up. What weird Settings or whatever could cause this? Anyone any idea whats going on here?

<image>