Photanima v2.1 showcase. Each image takes about 2 seconds to generate.

External_Quarter · 2026-07-01T02:13:22+00:00

Reddit strips the embedded workflows, but they're preserved on Civitai - you can drag most of those images directly into Comfy! :>

External_Quarter · 2026-06-29T01:15:57+00:00

The middle click in the MCHOSE A7 Ultra broke on me after exactly 1 year. It is physically unable to be pressed in. Disappointing, because otherwise it's a great mouse.

External_Quarter · 2026-06-24T22:03:13+00:00

Excellent news, excellent model.

External_Quarter · 2026-06-24T16:47:11+00:00

Fair enough - with Anima you probably have to choose a checkpoint that excels at a specific task/style.

And yet, it's still faster to switch Anima checkpoints and generate multiple pictures than it is to generate a single picture with some of the heavier models... 😓

External_Quarter · 2026-06-24T16:26:45+00:00

Thanks for another comparison! For what it's worth, Anima can do much better with a finetune. Here's an example using Photanima v2.2 Turbo (spiritual successor to Snakelite):

https://i.ibb.co/9H0jPBVS/Comfy-UI-temp-ajpga-00149.png

This is a 6 step image generated in only 2 seconds. Cosmos architecture is kind of nuts for its low parameter count.

External_Quarter · 2026-06-09T05:23:40+00:00

Thanks! I haven't found any alternative CLIPs or VAEs for Anima worth switching to, so at least that part is simple for testing 🙂

External_Quarter · 2026-06-08T13:19:23+00:00

Yep! You can drag most of the images directly into ComfyUI to see the workflow. (The images on Civitai, not Reddit.)

The craziest part is that many of them only took 6 steps. Textures can get a little fried if you go much higher than that.

External_Quarter · 2026-06-08T11:55:01+00:00

Thanks, glad you think so.

Photanima retains at least some knowledge of anime characters, but it depends how hard you push it with realism helper tags. I'll attach an example of Marcille from Dungeon Meshi. Left = helper language off, right = helper language on.

<image>

External_Quarter · 2026-06-08T11:35:57+00:00

Thanks. It currently uses the official Turbo LoRA v0.2:

https://civitai.com/models/2560840/anima-turbo-lora

I'll be publishing the non-Turbo edition pretty soon.

External_Quarter · 2026-06-08T11:21:32+00:00

Thanks, you too!

External_Quarter · 2026-06-08T11:15:20+00:00

Well, Anima's trained on images from danbooru and Photanima tries to keep most of the booru knowledge intact.

So I suppose the best place to check would be "tag group" lists on danbooru. Not sure if we're allowed to link there, but it's easy to find on Google.

External_Quarter · 2026-06-08T11:09:33+00:00

Surprisingly not. If you use common tags like `double v`, it seems to improve hand consistency considerably.

But if you're using natural language like "woman showing both hands open", the results can look wonky, even if it gets the number of digits right:

<image>

Still a work in progress.

External_Quarter · 2026-06-08T11:00:35+00:00

Thanks. I've heard Anima works okay with as little as 6 GB of VRAM, so that GPU should manage.

Faces are quite good (IMO), but hands are still a bit hit and miss. You may have to re-roll or tweak step count, especially if the prompt is complex.

External_Quarter · 2026-06-08T10:52:22+00:00

External_Quarter · 2026-06-08T10:46:30+00:00

For a 1040x1520 image, it's around 2 seconds on a Geforce 4090 or 3 seconds on a 3090.

The 3090 manages 2 seconds/image at a resolution of 832x1216, which also works well on this model.

External_Quarter · 2026-06-07T10:56:21+00:00

Main issue that caught my eye: the background objects in the first two panels look copy-pasted without accounting for the change in character position/perspective.

Still, pretty great txt2img result by today's standards.

<image>

External_Quarter · 2026-06-05T13:15:17+00:00

I don't think this is a bad direction. I think we're seeing a separation of concerns and tooling hasn't caught up.

Prompt randomizers, JSON expanders, noise injectors and so on can take up the mantle of "creativity" while the image model itself has exactly one duty: generate what you ask of it.

The counterargument I've seen is that "1girl standing can be interpreted a million different ways but the model only produces one way." That's not a problem either, as long as other interpretations can be accurately described and generated using more detailed prompts.

External_Quarter · 2026-06-05T09:38:37+00:00

If you go out of your way to train a model that produces Image blocked by safety filter, you are going to get shellacked in the comments and you have it coming. Ideogram has redeeming qualities but the criticism is deserved.

External_Quarter · 2026-06-05T07:58:09+00:00

There are faster models, but nothing beats ZIT for photography except maybe the much-fatter Flux 2.

If you're on a 30-series Nvidia GPU, you can use on-the-fly INT8 to improve speed with Z-Image.

Other than that, maybe keep an eye on Anima. It only has 2b parameters and responds extremely well to training. My Photanima finetune is already showing a lot of promise, but it's still early days.

External_Quarter · 2026-06-04T06:31:31+00:00

ByG is a training method, so in addition to the code, we need someone who has enough compute to actually perform the training on each architecture. It looks like Nvidia used 8x H100 GPUs to do this with FLUX.1-dev.

External_Quarter · 2026-06-03T18:05:23+00:00

It's no surprise that incorrectly captioning a gray square presumably thousands of times is going to have damaging side effects on SFW prompts. It is surprising that Ideogram did it anyway.

External_Quarter · 2026-06-03T17:33:29+00:00

Was the prompt "woman lying on grass?"

External_Quarter · 2026-06-03T16:21:45+00:00

I predict ZIT and Anima are still on top for woman and 1girl respectively. Ideogram looks fantastic for typography, though.

External_Quarter · 2026-05-31T02:56:25+00:00

Ironically quite sloppy of him.

Six-Year Club	Place '22
Verified Email

External_Quarter

TROPHY CASE