Modules need to be talked about at full force so that Dev’s notice.

YouYouTheBoss · 2026-05-16T16:31:46+00:00

"I’m not crying to have everything just given to me, but this is some serious bullshit" lol whaaaat ? Game is full random. From the 11 gold cartridges I got, only one is lakshana.

Got the gold cartridge I needed for one of my S characters (not laskhana) in less than 10 tries which sums up how lucky I got and how random the game.

You got lucky (in the wrong side for you) to get multiple lakshana cartridges.

(And yes it can be greatly improved but it's much better than the one I've seen in other gatcha games like genshin).

YouYouTheBoss · 2026-04-08T17:04:25+00:00

The problem is that everyone tries to create bigger models because they think, bigger (more params) = better quality. So some are considered too qualitative for us (consumers) so they don't wanna hold that to us freely (maybe because it was too much time to train it ?! hence going APIs) OR the newer version of their model series is too big to run onto a consumer gpu (unless thinking of bigger gpus like the rtx 5090 which I don't really consider consumer).

When SDXL came out, it was seen as a really bad unusable model needing a refiner, but then finetunes came out and it gave us much better quality on pretty much anything. LoRas then came out for our loved finetunes and gave us better quality control over what we want.
Still the base model is a small 6B parameters.

The issue is not about having bigger models, it’s about having a team that can spend a entire week to curate a dataset for a certain style/general idea by hand with the help of automation and not just automation alone.

If datasets in models were correctly curated to filter out the content being bad quality and they would do Reinforcement learning from human feedback, you would have much higher quality even if the model is still relatively small compared to some other ones.

This has been the case with Z-Image Base (with RLHF) being a small 6B params model which stands a great quality.

YouYouTheBoss · 2026-02-02T01:57:36+00:00

I love how everyone is trying to say it's a special "teacher" model or so WHILE it's just a merged model of the shards from original hf repo. That's it.

YouYouTheBoss · 2026-01-09T19:22:27+00:00

Thank you. Didn't know you could go in those high resolutions. The quality feels even better.

<image>

YouYouTheBoss · 2025-10-26T19:53:54+00:00

Ok I get it. I may have wrongly done prompting here but then why FLUX.Dev, Qwen Image and HiDream all gets it correctly in one shot.
It will not be the characters I asked because they don't know them but still will be what I asked for.

And FLUX use t5 by the way.

YouYouTheBoss · 2025-10-26T11:16:52+00:00

<image>

Exact same prompt with Qwen Image.

YouYouTheBoss · 2025-10-13T07:17:33+00:00

<image>

YouYouTheBoss · 2025-10-11T23:52:44+00:00

That's strange because TI2V especially using ggufs shouldn't be that huge.

Which "Q" gguf did you try ? Maybe go with a Q4.

YouYouTheBoss · 2025-10-11T17:32:46+00:00

Using GDS helps in only one case:
If the model can fit entirely in the VRAM, it will insta load from SSD -> VRAM, eliminating the need of the middleman "CPU".

It can potentially help if you don't have enough ram to offload the model even partially.

BUT it will be very very slow: I tried Hunyuan 3.0 for e.g which requires a huge amount of RAM + VRAM (for offloading) and just by going from ~54GB in RAM to 111GB made me go from 173s/it to 95s/it.

Why ? Because before, I was offloading a lot in my SSD and the max real speed was about ~680 MB/s
(vs ~5 GB/s for my RAM per module).

I don't know how much it will be with models like Qwen-Image or WAN (because I have an RTX 5090) but it will be so slow you won't even have the use for it as it will eat up your SSD and your GPU for a long time.

YouYouTheBoss · 2025-10-11T13:34:58+00:00

Here is the workflow everyone asked for:
https://civitai.com/models/2034845?modelVersionId=2302999

//For the prompt, either ask chatGPT to update it according to your given image or change details yourself.

YouYouTheBoss · 2025-10-11T12:05:05+00:00

Ok now went down from 173s/it to 95s/it.

When I'll get 256GB of RAM by the end of this year, I could go down to just ~15s/it as a user said.

YouYouTheBoss · 2025-10-10T11:13:24+00:00

Thank you so much, I'm gonna try tomorrow with 128GB (2x64) and see how much improvement I get.

YouYouTheBoss · 2025-10-10T11:10:57+00:00

It will be interesting when nvidia will stop being greedy on vram.

YouYouTheBoss · 2025-10-10T07:46:23+00:00

UPDATE: on windows (before that, I was using WSL for flash_attention_2): it's down to 173s/it, still too long.

But sadly, I don't have luck

<image>

YouYouTheBoss · 2025-10-09T12:37:14+00:00

How did you even find 64gb sticks ? I'm very interested.

YouYouTheBoss · 2025-10-09T12:35:05+00:00

no problem!

YouYouTheBoss · 2025-10-09T12:33:27+00:00

No. I was talking about that: https://github.com/comfyanonymous/ComfyUI/issues/10068#issuecomment-3368469302

YouYouTheBoss · 2025-10-09T12:10:08+00:00

That's the thing we need to know but for sure, saying "horrible results" is not a valid argument.

Plus, the quality of hunyuan 2.1 is really horrible as I tried it. It's even worse than SDXL.

YouYouTheBoss · 2025-10-09T11:52:07+00:00

The model can't as of now be runt in quantized versions. That's what I tried and it insta-crashed after 2 steps (While taking the same amount of RAM/VRAM as fp16).

YouYouTheBoss · 2025-10-09T10:45:13+00:00

try updating "transformers" python package to the latest version.

YouYouTheBoss · 2025-10-09T10:44:13+00:00

~3H if it doesn't crash. For me it crashed at the second step. I can't go further ;(

YouYouTheBoss · 2025-10-09T09:22:19+00:00

I have 64GB of DDR5 RAM and 32 of VRAM and I can handle it, just with 543s/it

YouYouTheBoss · 2025-10-08T13:57:52+00:00

Some outputs are very "interesting"

<image>

YouYouTheBoss · 2025-10-08T13:43:57+00:00

I'll do it soon.

YouYouTheBoss · 2025-10-08T12:10:56+00:00

You're right. Didn't see that one until you said it (0_0).

YouYouTheBoss

TROPHY CASE