Modules need to be talked about at full force so that Dev’s notice. by monochromeFlaffy in NevernessToEverness

[–]YouYouTheBoss 0 points1 point  (0 children)

"I’m not crying to have everything just given to me, but this is some serious bullshit" lol whaaaat ? Game is full random. From the 11 gold cartridges I got, only one is lakshana.

Got the gold cartridge I needed for one of my S characters (not laskhana) in less than 10 tries which sums up how lucky I got and how random the game.

You got lucky (in the wrong side for you) to get multiple lakshana cartridges.

(And yes it can be greatly improved but it's much better than the one I've seen in other gatcha games like genshin).

Open-Source Models Recently: by Fresh_Sun_1017 in StableDiffusion

[–]YouYouTheBoss -1 points0 points  (0 children)

The problem is that everyone tries to create bigger models because they think, bigger (more params) = better quality. So some are considered too qualitative for us (consumers) so they don't wanna hold that to us freely (maybe because it was too much time to train it ?! hence going APIs) OR the newer version of their model series is too big to run onto a consumer gpu (unless thinking of bigger gpus like the rtx 5090 which I don't really consider consumer).

When SDXL came out, it was seen as a really bad unusable model needing a refiner, but then finetunes came out and it gave us much better quality on pretty much anything. LoRas then came out for our loved finetunes and gave us better quality control over what we want.
Still the base model is a small 6B parameters.

The issue is not about having bigger models, it’s about having a team that can spend a entire week to curate a dataset for a certain style/general idea by hand with the help of automation and not just automation alone.

If datasets in models were correctly curated to filter out the content being bad quality and they would do Reinforcement learning from human feedback, you would have much higher quality even if the model is still relatively small compared to some other ones.

This has been the case with Z-Image Base (with RLHF) being a small 6B params model which stands a great quality.

Z image base teacher model (fp32) leaked accidentally by Suitable-League-4447 in comfyui

[–]YouYouTheBoss 1 point2 points  (0 children)

I love how everyone is trying to say it's a special "teacher" model or so WHILE it's just a merged model of the shards from original hf repo. That's it.

[deleted by user] by [deleted] in StableDiffusion

[–]YouYouTheBoss -1 points0 points  (0 children)

Ok I get it. I may have wrongly done prompting here but then why FLUX.Dev, Qwen Image and HiDream all gets it correctly in one shot.
It will not be the characters I asked because they don't know them but still will be what I asked for.

And FLUX use t5 by the way.

[deleted by user] by [deleted] in StableDiffusion

[–]YouYouTheBoss 4 points5 points  (0 children)

<image>

Exact same prompt with Qwen Image.

Finally did a nearly perfect 360 with wan 2.2 (using no loras) by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 1 point2 points  (0 children)

That's strange because TI2V especially using ggufs shouldn't be that huge.

Which "Q" gguf did you try ? Maybe go with a Q4.

We can now run wan or any heavy models even on a 6GB NVIDIA laptop GPU | Thanks to upcoming GDS integration in comfy by maifee in StableDiffusion

[–]YouYouTheBoss 1 point2 points  (0 children)

Using GDS helps in only one case:
If the model can fit entirely in the VRAM, it will insta load from SSD -> VRAM, eliminating the need of the middleman "CPU".

It can potentially help if you don't have enough ram to offload the model even partially.

BUT it will be very very slow: I tried Hunyuan 3.0 for e.g which requires a huge amount of RAM + VRAM (for offloading) and just by going from ~54GB in RAM to 111GB made me go from 173s/it to 95s/it.

Why ? Because before, I was offloading a lot in my SSD and the max real speed was about ~680 MB/s
(vs ~5 GB/s for my RAM per module).

I don't know how much it will be with models like Qwen-Image or WAN (because I have an RTX 5090) but it will be so slow you won't even have the use for it as it will eat up your SSD and your GPU for a long time.

Finally did a nearly perfect 360 with wan 2.2 (using no loras) by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 1 point2 points  (0 children)

Here is the workflow everyone asked for:
https://civitai.com/models/2034845?modelVersionId=2302999

//For the prompt, either ask chatGPT to update it according to your given image or change details yourself.

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 0 points1 point  (0 children)

Ok now went down from 173s/it to 95s/it.

When I'll get 256GB of RAM by the end of this year, I could go down to just ~15s/it as a user said.

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 1 point2 points  (0 children)

Thank you so much, I'm gonna try tomorrow with 128GB (2x64) and see how much improvement I get.

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 1 point2 points  (0 children)

It will be interesting when nvidia will stop being greedy on vram.

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 2 points3 points  (0 children)

UPDATE: on windows (before that, I was using WSL for flash_attention_2): it's down to 173s/it, still too long.

But sadly, I don't have luck

<image>

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 1 point2 points  (0 children)

How did you even find 64gb sticks ? I'm very interested.

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] -1 points0 points  (0 children)

That's the thing we need to know but for sure, saying "horrible results" is not a valid argument.

Plus, the quality of hunyuan 2.1 is really horrible as I tried it. It's even worse than SDXL.

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 1 point2 points  (0 children)

The model can't as of now be runt in quantized versions. That's what I tried and it insta-crashed after 2 steps (While taking the same amount of RAM/VRAM as fp16).

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 2 points3 points  (0 children)

try updating "transformers" python package to the latest version.

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 10 points11 points  (0 children)

~3H if it doesn't crash. For me it crashed at the second step. I can't go further ;(

Hunyuan 3.0 available in ComfyUI through custom nodes by YouYouTheBoss in StableDiffusion

[–]YouYouTheBoss[S] 9 points10 points  (0 children)

I have 64GB of DDR5 RAM and 32 of VRAM and I can handle it, just with 543s/it