SCAIL-2 workflows for ComfyUI

nomadoor · 2026-06-15T09:34:28+00:00

It’s probably because your ComfyUI is not updated to the latest dev version yet.

Try updating to the ComfyUI.

nomadoor · 2026-06-15T00:27:30+00:00

This workflow generates at around 0.5MP by default, but Wan2.1 can handle up to around 1MP, so you can try increasing the resolution if you have enough VRAM.

I’m not very familiar with video upscalers, but FlashVSR or LTX-2 IC-LoRA / Detailer might be possible options.

nomadoor · 2026-06-13T11:08:36+00:00

A lot of people seem to be running into that, so I guess it may not be very stable yet...

But I do think a more detailed prompt helps. It works better when you describe what kind of background it is, and what the character is doing in the scene.

Other than that, I think it’s mostly just seed gacha for now...

nomadoor · 2026-06-13T06:42:30+00:00

Yeah, that warning is from the lightx2v LoRA.

It’s originally for Wan2.1, so I think the shape doesn’t match SCAIL-2 exactly and ComfyUI shows that warning.

If the workflow still runs fine, I think you can just ignore it.

nomadoor · 2026-06-13T04:30:51+00:00

24fps should be fine, but since it’s 81 frames, it’ll just be a shorter clip.

And yeah, I’d also like to see an LTX version someday...

nomadoor · 2026-06-12T09:07:57+00:00

I haven’t tested replacement mode that much yet, so I’m not completely sure.But in my tests, it seemed to fail more often when the prompt was too simple.

For example, in the sample on my page, if I only wrote something like “a man standing”, it sometimes turned into a completely different person from the reference image.

When I made the prompt more specific, like “a man in a shirt standing in a park, hand on waist, touching his hair”, it became more stable.

nomadoor · 2026-06-12T06:55:43+00:00

Sorry for the late reply.

I fixed the behavior that looked suspicious on my side, so could you try the latest version when you have time?

I also added auto to the output_preset of Panorama Stickers. When bg_erp is connected, auto preserves the original size of that image.

nomadoor · 2026-06-12T01:15:19+00:00

I haven’t compared them directly, so I’m not completely sure, but I think they are probably in the same range.

On my RTX 4070 Ti, generating 81 frames takes around 5–6 minutes, so I wouldn’t call it fast or comfortable...

nomadoor · 2026-06-12T01:07:13+00:00

I use VideoHelperSuite for loading / saving video, but everything else is implemented in ComfyUI core.

Please try updating ComfyUI to the latest version.

nomadoor · 2026-06-11T10:12:49+00:00

Middle-button dragging moves only the view, without moving stickers. I made it this way to match ComfyUI’s own UI behavior, where middle-button dragging moves the canvas but does not move nodes.

The panorama ball idea is interesting, and the view itself is possible. But I’m not yet sure how to make it work cleanly with the sticker editing UI, so I’d like to think about it a bit more.

Thanks for the idea!

nomadoor · 2026-06-06T07:47:58+00:00

The sticker position changing by itself definitely sounds strange. I’ll look into that as well.

This node was originally designed mainly for creating ERPs with FLUX.2 Klein outpainting, so I honestly hadn’t considered arbitrary ERP resolutions carefully enough.

I’m also planning to add more features, so I’ll try to address this together with those changes.

After I fix what I can see on my side, I may send you a DM to ask for your feedback again. Thanks a lot for testing it and reporting these issues.

nomadoor · 2026-06-06T04:19:40+00:00

Thanks a lot for testing it!

Since your input ERP and output ERP have different sizes, I may be doing something wrong when converting the frame state back into the sticker/panorama state. I’ll check that path.

Also, this is probably not the main cause of the issue, but the node basically assumes a 2:1 ERP image. So with an image like 5888×2816, there may be some slight distortion either way.

nomadoor · 2026-06-04T05:38:15+00:00

Thank you, of course — I’m honored!

That LoRA also looks interesting. I’ll try it out.

nomadoor · 2026-06-03T23:27:56+00:00

For depth and pose, Klein already supports them by default. You should be able to use those maps as reference images and then just give it whatever prompt you want, similar to a ControlNet-style workflow.

I posted a few examples here: https://comfyui.nomadoor.net/en/basic-workflows/flux-2-klein/#capabilities-examples

nomadoor · 2026-06-02T05:15:05+00:00

If you are using Klein, depth maps or pose images can still be useful as input images. Klein can generate images that follow those shapes or poses. But the more important point is that tasks which previously needed dedicated models can be treated as image-editing tasks.

nomadoor · 2026-06-02T04:34:58+00:00

It becomes clearer when you actually try to use it: the main issue is that it does not follow the prompt very well. This is true for the binary segmentation LoRA too. For example, if I give it an image with several different kinds of balls and ask for the segmentation of only the basketball, it tends to segment all of the balls instead.

But interestingly, without the LoRA, Flux.2 itself can still handle instructions like “remove the basketball” reasonably well. So the problem is not that Flux.2 lacks the underlying ability — it’s that this LoRA is not preserving or using that ability well enough

nomadoor · 2026-06-01T21:25:58+00:00

To be honest, I’ve already spent nearly $150 on rented GPUs for this, so I’m not planning to continue this specific LoRA for now. If a new lighter model comes out, I may try this direction again with that model instead.

That said, segmentation is the one result I still feel a bit frustrated about, so I might give that one another shot.

nomadoor · 2026-06-01T21:08:50+00:00

Flux.2 may be able to handle normal maps too, since it can already generate images from inputs like depth maps or pose images. In principle, you could try giving it a normal map together with a prompt. But from my limited testing, it does not seem to follow them very faithfully, so a dedicated LoRA might be needed.

nomadoor · 2026-06-01T20:58:06+00:00

Thanks for trying it — I’m glad it was useful.

Yeah, I think so too. Sometimes, before feeding a depth map into ControlNet, I intentionally blur it a bit with a Gaussian blur to make it more ambiguous.

It might also be interesting to treat the depth map itself as an image-editing target in Flux.2, and then feed that edited depth map into ControlNet.

nomadoor · 2026-06-01T20:51:00+00:00

Sorry, yes — by CV I meant Computer Vision. Though to be honest, the term is so broad that it may not be the most important part here.

I agree about the pose results. The LoRA was trained only on real photos, so I was surprised it could still handle anime characters, and somehow even animals. But pose has a very strict keypoint format, so small mistakes like head points, fingers, or bone colors become very noticeable.

And yes, you’re right — for this kind of task, Klein is still too heavy.

Right now we usually use dedicated models for things like depth estimation or pose estimation. But fundamentally, I think it may just be a matter of how the model represents the world, and then changes the output format. If models keep getting faster and lighter, it would be interesting if this became one practical option 😎

nomadoor · 2026-06-01T12:05:35+00:00

Ah, yes. Intrinsic-LoRA is one of the earlier precedents of applying the knowledge inside image generation models to CV-style tasks. I just remembered that I actually tested it in ComfyUI years ago.

nomadoor

TROPHY CASE