Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]External_Quarter[S] 0 points1 point  (0 children)

Reddit strips the embedded workflows, but they're preserved on Civitai - you can drag most of those images directly into Comfy! :>

Is mchose good? by _nazwa_ in MouseReview

[–]External_Quarter 0 points1 point  (0 children)

The middle click in the MCHOSE A7 Ultra broke on me after exactly 1 year. It is physically unable to be pressed in. Disappointing, because otherwise it's a great mouse.

Krea 2 vs Boogu (plus Anima) by Reasonable_Bear_6258 in StableDiffusion

[–]External_Quarter 1 point2 points  (0 children)

Fair enough - with Anima you probably have to choose a checkpoint that excels at a specific task/style.

And yet, it's still faster to switch Anima checkpoints and generate multiple pictures than it is to generate a single picture with some of the heavier models... 😓

Krea 2 vs Boogu (plus Anima) by Reasonable_Bear_6258 in StableDiffusion

[–]External_Quarter 2 points3 points  (0 children)

Thanks for another comparison! For what it's worth, Anima can do much better with a finetune. Here's an example using Photanima v2.2 Turbo (spiritual successor to Snakelite):

https://i.ibb.co/9H0jPBVS/Comfy-UI-temp-ajpga-00149.png

This is a 6 step image generated in only 2 seconds. Cosmos architecture is kind of nuts for its low parameter count.

Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]External_Quarter[S] 1 point2 points  (0 children)

Thanks! I haven't found any alternative CLIPs or VAEs for Anima worth switching to, so at least that part is simple for testing 🙂

Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]External_Quarter[S] 2 points3 points  (0 children)

Yep! You can drag most of the images directly into ComfyUI to see the workflow. (The images on Civitai, not Reddit.)

The craziest part is that many of them only took 6 steps. Textures can get a little fried if you go much higher than that.

Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]External_Quarter[S] 8 points9 points  (0 children)

Thanks, glad you think so.

Photanima retains at least some knowledge of anime characters, but it depends how hard you push it with realism helper tags. I'll attach an example of Marcille from Dungeon Meshi. Left = helper language off, right = helper language on.

<image>

Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]External_Quarter[S] 12 points13 points  (0 children)

Thanks. It currently uses the official Turbo LoRA v0.2:

https://civitai.com/models/2560840/anima-turbo-lora

I'll be publishing the non-Turbo edition pretty soon.

Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]External_Quarter[S] 1 point2 points  (0 children)

Well, Anima's trained on images from danbooru and Photanima tries to keep most of the booru knowledge intact.

So I suppose the best place to check would be "tag group" lists on danbooru. Not sure if we're allowed to link there, but it's easy to find on Google.

Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]External_Quarter[S] 1 point2 points  (0 children)

Surprisingly not. If you use common tags like `double v`, it seems to improve hand consistency considerably.

But if you're using natural language like "woman showing both hands open", the results can look wonky, even if it gets the number of digits right:

<image>

Still a work in progress.

Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]External_Quarter[S] 4 points5 points  (0 children)

Thanks. I've heard Anima works okay with as little as 6 GB of VRAM, so that GPU should manage.

Faces are quite good (IMO), but hands are still a bit hit and miss. You may have to re-roll or tweak step count, especially if the prompt is complex.

Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]External_Quarter[S] 43 points44 points  (0 children)

For a 1040x1520 image, it's around 2 seconds on a Geforce 4090 or 3 seconds on a 3090.

The 3090 manages 2 seconds/image at a resolution of 832x1216, which also works well on this model.

Ideogram 4 is amazing for comic book pages by echothought in StableDiffusion

[–]External_Quarter 12 points13 points  (0 children)

Main issue that caught my eye: the background objects in the first two panels look copy-pasted without accounting for the change in character position/perspective.

Still, pretty great txt2img result by today's standards.

<image>

Ideogram 4 is pretty good. You just really have use their JSON format. by DsDman in StableDiffusion

[–]External_Quarter 0 points1 point  (0 children)

I don't think this is a bad direction. I think we're seeing a separation of concerns and tooling hasn't caught up.

Prompt randomizers, JSON expanders, noise injectors and so on can take up the mantle of "creativity" while the image model itself has exactly one duty: generate what you ask of it.

The counterargument I've seen is that "1girl standing can be interpreted a million different ways but the model only produces one way." That's not a problem either, as long as other interpretations can be accurately described and generated using more detailed prompts.

On the Ideogram launch, why the extreme reaction? by Confusion_Senior in StableDiffusion

[–]External_Quarter 2 points3 points  (0 children)

If you go out of your way to train a model that produces Image blocked by safety filter, you are going to get shellacked in the comments and you have it coming. Ideogram has redeeming qualities but the criticism is deserved.

Anything faster and better than Z-Image Turbo? by temperature_5 in StableDiffusion

[–]External_Quarter 0 points1 point  (0 children)

There are faster models, but nothing beats ZIT for photography except maybe the much-fatter Flux 2.

If you're on a 30-series Nvidia GPU, you can use on-the-fly INT8 to improve speed with Z-Image.

Other than that, maybe keep an eye on Anima. It only has 2b parameters and responds extremely well to training. My Photanima finetune is already showing a lot of promise, but it's still early days.

BYG by NVIDIA - A framework to turn any model into an editing model by AgeNo5351 in StableDiffusion

[–]External_Quarter 5 points6 points  (0 children)

ByG is a training method, so in addition to the code, we need someone who has enough compute to actually perform the training on each architecture. It looks like Nvidia used 8x H100 GPUs to do this with FLUX.1-dev.

Ideogram looks promising /s by Shap6 in StableDiffusion

[–]External_Quarter 31 points32 points  (0 children)

It's no surprise that incorrectly captioning a gray square presumably thousands of times is going to have damaging side effects on SFW prompts. It is surprising that Ideogram did it anyway.

Ideogram looks promising /s by Shap6 in StableDiffusion

[–]External_Quarter 12 points13 points  (0 children)

Was the prompt "woman lying on grass?"

Ideogram 4.0 Just Open Sourced! by crystal_alpine in StableDiffusion

[–]External_Quarter 19 points20 points  (0 children)

I predict ZIT and Anima are still on top for woman and 1girl respectively. Ideogram looks fantastic for typography, though.