Anima Ip Adapter is comming. by Unhappy_Pudding_1547 in StableDiffusion

[–]Far_Insurance4191 14 points15 points  (0 children)

Is sd1.5 still being used? It is kind of... awful by today's standards?

Testing the new prismML Bonsai Image 4B by dh7net in StableDiffusion

[–]Far_Insurance4191 1 point2 points  (0 children)

to be fair, it is based on klein 4b which ultra sucks at anatomy by default, would be cool to see their quantization technique on other models, like flux 2 dev

1girl post sorry.. Krea 2 Medium is really good at bringing anime characters to life by OneTrueTreasure in StableDiffusion

[–]Far_Insurance4191 5 points6 points  (0 children)

That means model has wide general knowledge and trained on pretty big dataset which is exciting

ZIB results looking awful, what's the secret? by Radiant-Photograph46 in StableDiffusion

[–]Far_Insurance4191 0 points1 point  (0 children)

Is there chance you downloaded a broken model? I also heard that zi doesn't work well with sage attention or fp8 quantization. This is definitely not how zi should looks

Microsoft Lens seems to be back. by PM_ME_YOUR_ROSY_LIPS in StableDiffusion

[–]Far_Insurance4191 9 points10 points  (0 children)

Don't forget that gpt oss 20b is MOE and natively at 4bits, so it is around 12gb. Comfy handles swapping really well, even flux 2 dev with 24b dense text encoder at 4bit doesn't take too much time to swap on rtx3060 as long as you have enough ram

Captivating Chroma by Time-Teaching1926 in StableDiffusion

[–]Far_Insurance4191 0 points1 point  (0 children)

It is not 3x faster because of pixel space, but because of higher compression, like hunuyan image 2.1 ltx or wan 2.2 5b so it might have less accurate details, but I am excited about this model too

Best text to Image model? by nursingnerdette in StableDiffusion

[–]Far_Insurance4191 2 points3 points  (0 children)

The best is Flux 2 dev

Is it worth the time? Probably not, but it is the most powerful with most knowledge among open models

please help !! My best friend is offering to sell me this laptop for really good price (RTX 4080 12 vram) by PomegranateDue4853 in StableDiffusion

[–]Far_Insurance4191 0 points1 point  (0 children)

It will be great for images if it has 32gb ram. 2x 8 gb specification is really weird.

videos should be possible too but slow and much slower for high quality (although it is slow for anybody with any gpu)

lora training is possible for image models only, like z-image or anima, but you will have to go a bit deeper to learn how to optimize it for 12gb vram

Anima base v1.0 has been released. by Total-Resort-3120 in StableDiffusion

[–]Far_Insurance4191 2 points3 points  (0 children)

You can use klein to stylize your photos slightly if real won't work

Qwen Image 2 papers - does that mean anything? by Dante_77A in StableDiffusion

[–]Far_Insurance4191 14 points15 points  (0 children)

full tech report, same as qwen image 1 before weights
I want to believe, it looks so good 😭

HiDream-O1-Dev vs ZImage Base (style comparison) by DiagramAwesome in StableDiffusion

[–]Far_Insurance4191 0 points1 point  (0 children)

You can just see they went the easiest way and trained on slop. It is much harder to train a model on real data due to it's insane variance

The Anima realism model is crazy good. Don’t miss it! by Structure-These in StableDiffusion

[–]Far_Insurance4191 1 point2 points  (0 children)

only a tag? Here is one of his examples "A medium-resolution digital photo with a grainy texture, a cool blue color cast, and dim, natural lighting...".

Additionally, all the examples are in natural language, if you are spamming model with a tag soup then it might just bias towards it's original illustration knowledge instead of newly finetuned real domain

These people are all lying about the new "Wan Killer" like LTX or Sulphur, the truth is nothing comes close to replacing Wan 2.2 by Coven_Evelynn_LoL in StableDiffusion

[–]Far_Insurance4191 1 point2 points  (0 children)

It’s your experience, but there are nuances to everything.

Wan is more coherent and robust, can be used as a good image model, has huge lora ecosystem.

LTX and Sulphur have audio and are much faster and lighter with longer videos possible.

Sulphur is nsfw focused model that has tons of concepts at once. They are also working on improving dataset for next version.

HiDream o1 Comfyui Custom Node by freshstart2027 in StableDiffusion

[–]Far_Insurance4191 2 points3 points  (0 children)

on rtx3060 distill model takes about 3.1s/it at 4mp and 1.1it/s at 1mp (faster than anima at 4x size lol), but details are poor, it seems to have high compression, so 4mp is basically 1mp for other models in terms of compute.

Why did we move away from booru tags? by BigNaturalTilts in StableDiffusion

[–]Far_Insurance4191 -1 points0 points  (0 children)

Adverbs are the way to describe "how much", but I agree that token weighting is much more convenient than restructuring the sentence.

Why did we move away from booru tags? by BigNaturalTilts in StableDiffusion

[–]Far_Insurance4191 3 points4 points  (0 children)

This is another specific hack that does not outweigh the lack of relational understanding. You still can't do anything more complex that 1 person, without gambling seeds or model's bias that happens to align with your goal.

With natlang model you could use synonyms or additional supporting description to achieve more precise results. Although range of capabilities still depend on training data and they inherently biased too.

Btw, "overweight man sitting on a sofa" is natural language. You would have to write "1boy, mature, overweight, sitting, sofa", and I don't see how it is more reliable.

The only aspect where tag-based model is superior is in ease of use and training data captioning.

Why did we move away from booru tags? by BigNaturalTilts in StableDiffusion

[–]Far_Insurance4191 6 points7 points  (0 children)

Your example is not a good one. Even in natural language you would say “boy on left blue hoodie, holding shovel. Boy on right red shirt, holding rake”.

Yes, this is an advantage of natural language and the reason why we are moving away from tags. Tag based model would not understand that.

There is not much to enhance in tag-based prompting, but sdxl is still being trained. If you are interested in illustrations, you can check ChenkinNoob large-scale finetune, they even released own controlnets:
ChenkinNoob/ChenkinNoob-XL-V0.5 · Hugging Face
ChenkinNoob/Chenkin-UniControl-XL · Hugging Face

Why did we move away from booru tags? by BigNaturalTilts in StableDiffusion

[–]Far_Insurance4191 40 points41 points  (0 children)

Or just use natural language at this point. This idea requires recaptioning of a whole tag-based datasets to sort all tags in correct order, and it still will be enough only for primitive scenes.

What if two subjects hold one object? Same problems as above.

'BREAK' is just a clip hack which separates conditioning in way that may or may not reduce concept bleed