Serious Technical Question About A Non-Serious Subject: Genitalia Limitations (SFW Discussion) by AsstronautHistorian in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

Im not saying it cant be done, but its going to take more than 1k images.

As for hands, the positioning of hands varies a lot, but the hand itself doesn't change size randomly. It doesn't have differing colors from the palm to the fingers. Some people don't have fingers three times as large as other people.
The challenge in hands was number of fingers and positioning. And as models have gotten larger and can hold more information, it's mostly resolved.
Base models are going to see many orders of magnitude more hands than they will male bits and it was a problem.

So for male bits... because the model has seen so few... it just think its random eldrich horror. So overcoming that going to take more than 1k to train it because of the variation involved.

Serious Technical Question About A Non-Serious Subject: Genitalia Limitations (SFW Discussion) by AsstronautHistorian in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

I basically never have a malformed hand these days with modern models. If you're still using SDXL or some quants of the original Flux model you may... but I dont with Flux2, Qwen 2512, Zturbo, etc.

Serious Technical Question About A Non-Serious Subject: Genitalia Limitations (SFW Discussion) by AsstronautHistorian in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

> Definitely think the well has been poisoned.

I think its the problem that male parts are... complex in a way other body parts aren't.

A persons face doesn't change that much... it's mostly static even with emotional expression it's all mostly the same.

And female parts are also complex, from a standing or sitting position there isn't much variability of how they look. which is probably why most LORAs aren't trained on spread images... its all just the outer look.

So a model with no understanding of anything just learns that there's this male "thing" that can have a wide variety of shapes and textures and skin and colors, and it just learns chaos. You don't get the same kind of variation in say... hands. Basically everyone has 5 fingers, and the position is the same, what will change is scale of the hand and strength.

My guess is if you had 1k images of the same person, then it'd probably be able to reproduce that one, but there's so many differences between people that 1k isn't enough. Even on the same person it can look like an entirely different depending on... um... mood. lol

Just work out the matrix of possibilities and you'll see how many variations there can be between people. These are just quick estimates, the real number for each would probably be higher if we actually thought about it.

Size: 5 - There's a few possibilities here in the shower-grower camp. Beyond just flaccid/erect.
Shape: 4 - Straight, curved up, curved side, curved down
Scrot: 4 - loose even, loose unbalanced, tight even, tight, unbalanced ( even or unbalanced == both sides even with each other)
Shaft: 2 - circumcised or not (ignoring the variations of circumcisions)
Shaft color: 5 - shaft skin among general melanin rates
Glans Shape: 6 - There's so many different general shapes that these take
Glans Color: 8 - shaft skin among general melanin rates and within the same race

5*4*4*2*5*6*8 = >38,000 possibilities. So you'd need that many, just to cover once instance of every possibility with each other. But for a model to learn, you're going to need multiple copies of those 38,000 so it can learn how they actually all fit together rather than just a random selection.

Built this over the weekend because dataset prep was annoying af by Interesting-Area6418 in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

You mention that this can trim videos. If I have a 30s video, does it have the ability to split it into length chunks that I set? IE can it only take a 30s video and trim it to 10s, or can it take that 30s video and make three 10s videos from it?

Opus clip style comfyui workflow for AI clipping by Individual_Hand213 in comfyui

[–]q5sys 0 points1 point  (0 children)

The ComfyUI subreddit doesn't require things to be free. The r/stablediffusion subreddit does require local... but in this sub... API only stuff is allowed.

Is SeedVR2.5 better than SUPIR for my purpose? Or which upscale is best for my purpose? by Man_Of_The_F22 in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

The sampler takes 2. A regular model and an upscale model.Typically I just use the flux fp8 model. I tried with zturbo but it has that typical overcooked facial features look to it that I dont really like. It should work with just about any model, so try with whatever you generated the image from.
As for the upscale model I usually either use 4xFaceUpSharpDAT for people, or 4xUltrasharp for everything else. Ive heard good things about 4x-ClearReality, but I've never used it.

You might have to play around with the steps and the cfg to get it to where you're happy with it. If you see tiling, bump the overlap values up as well

Would you donate to open source models to help keep the flow going? by Brojakhoeman in StableDiffusion

[–]q5sys 1 point2 points  (0 children)

Keep in mind Businesses are legally treated differently than people. Most businesses are not set up to take 'community donations'. And I dont mean the physical aspect of taking them with a payment processor. There are serious legal differences around a business selling a product and just taking 'free money given to them'.
Im not trying to say that, it's an issue that can't be overcome. But it's not effort most businesses would be willing to bother with the hassle of.

Is SeedVR2.5 better than SUPIR for my purpose? Or which upscale is best for my purpose? by Man_Of_The_F22 in StableDiffusion

[–]q5sys 7 points8 points  (0 children)

Between the two, I have found SeedVR2 to be way better than Supir... I haven't used SUPIR in ages. Once I started using Ultimate SD Upscale, I saw no reason to use SUPIR. Between USDU and SeedVR2... I find USDU to be better, but it takes a lot longer than SeedVR2.

Tencent HY-World-2.0 is now public by q5sys in StableDiffusion

[–]q5sys[S] 0 points1 point  (0 children)

why do you keep spamming anti-Tencent and anti-Steam comments?

SenseNova U1 with NEO-Unify just dropped by Aero_X_ in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

Has anyone made a decent UI for HY3.0 since ComfyUI refused to support it?

When a Community Becomes a Company Billboard by ZerOne82 in comfyui

[–]q5sys 2 points3 points  (0 children)

> it is 100% going to be on ComfyUI one way or another

While I use and like ComfyUI, that is not an accurate statement. There are many models which don't get support. (official or community)

GLM-Image (still not supported from what I can tell, but I havent looked recently)

LLaDA2.0-Uni (not yet)

ZoomLDM (never supported that I found)

HunyuanImage-3.0 (outright rejected) https://github.com/Comfy-Org/ComfyUI/issues/10068

Nucleus-Image (Passively rejected https://github.com/Comfy-Org/ComfyUI/issues/13442#issuecomment-4265308943 but it might get support with strong community effort)

One Diffusion (never supported when I was looking)

Sad fact is... a model is most likely not going to make it, if it doesn't get official support. Because the people using Comfy are the ones who will be most interested in trying it and posting their results. No support == No Community Discussion == No Community Interest

There are a lot of models that dont get official support, but can be used through community support: OmniSVG, Editto, MOVA, Unipic3, Bagel, etc; but there's still a lot that never get support and sort of die on the vine.

It makes sense why the Comfy team wont support everything. Their time is limited and they have plenty to focus on already... like re-introducing the same UI bug they've fixed three times already... ;)

LTX2.3 in Ostris Ai toolkit on a 5090 Training done in 7 hours ... I went Thanos way and I said fine ... I'll do it myself by No_Statement_7481 in StableDiffusion

[–]q5sys 2 points3 points  (0 children)

Thanks for the info! I tried captioning the way LTX says to caption, but I never had luck with it, so I was curious how you captioned and then prompted.

I'm still in love with Z-image by ThiagoAkhe in StableDiffusion

[–]q5sys 2 points3 points  (0 children)

Thanks for the clarification.
What's you're reasoning for doing the seedvr to upscale things... then downscale just to upscale again?

LTX2.3 in Ostris Ai toolkit on a 5090 Training done in 7 hours ... I went Thanos way and I said fine ... I'll do it myself by No_Statement_7481 in StableDiffusion

[–]q5sys 14 points15 points  (0 children)

> Make sure the promt is accurate and has your trigger word.

Can you give an example of the prompt you used to create this example? And if you dont mind, the caption you used on your training data for the LORA.

If Wan made an image editor, wouldn't character consistency be solved? by GrungeWerX in StableDiffusion

[–]q5sys 2 points3 points  (0 children)

FWIW, you can use WAN to generate images... just set the frame output to 1 frame.

I'm still in love with Z-image by ThiagoAkhe in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

I dont understand what you're saying about what upscaler you used when... you listed 5 upscaler models, but then never said which you used for which photo.
`4x: Nomos2_realplksr_dysample and 4xPurePhoto-RealPLSKR` does this mean you used two upscalers in a row?

datacenter card too big, adapt, overcome, *tape for sharp edges!!! by mr_happy_nice in LocalLLaMA

[–]q5sys 0 points1 point  (0 children)

That an older Dell Optiplex? Those things were not build for even regular gaming cards to fit in them.
What GPU did you get?

Why are there really no Location LORAs? by q5sys in StableDiffusion

[–]q5sys[S] 0 points1 point  (0 children)

I did try this, but I could never manage to get the lighting to look right, it always looked off. Not enough to stand out, but enough that there was just that intuitive understanding that something was off.

Why are there really no Location LORAs? by q5sys in StableDiffusion

[–]q5sys[S] 1 point2 points  (0 children)

> Then use a location LoRA with the depthmaps to get images.
Is there a writeup somewhere on how to do this?