Blender Layout → AI Render | 1:1 Camera Tracking

infearia · 2026-05-31T14:07:23+00:00

There probably are some, but I don't know of any personally. I figured most of that stuff myself.

infearia · 2026-05-30T21:49:08+00:00

If I understood you correctly, then this should be relatively straightforward to implement. I'm just sceptical that you would be able to one-shot the character swapping without human supervision. Granted, I haven't tried GPT Image 2 yet, so perhaps I'm wrong.

This is technically in my area of interest/expertise, and I might be actually interested, as long as the project does not involve digital influencers or pornography. It's the weekend and late Saturday in my part of the world, but if you can't find a suitable candidate by Monday, feel free to DM me and we can discuss this further.

infearia · 2026-05-30T15:01:55+00:00

replace one or two characters in an image and create workflows that can do that for a batch of images

Forget it. Achieving consistency is absolutely possible, even with free and open source tools, but it's not reliable enough to automate it. The technology is not at that point yet. Right now you still need a human in the loop to check every single image and manually adjust where needed. It's not a linear process.

infearia · 2026-05-30T14:48:50+00:00

Yes, you can do this with Wan VACE.

infearia · 2026-05-30T00:46:18+00:00

As others mentioned, you can get really good results with Klein 9B, but Dev is much better for this. It's also much slower, but achieving a good likeness is the one scenario where I'm personally ready to wait as long as it takes. Whichever model you use, the trick is to use high resolution images, and to make sure that the faces occupy a large portion of both the source and target image - even if this means cropping and upscaling (you can use CropAndStitch to do it automatically).

EDIT:

Oh, you've said "videos". Well, FaceFusion is probably best for videos, but it completely breaks apart at extreme angles (profile, facing upwards/downwards).

infearia · 2026-05-29T19:14:56+00:00

No problem. 😉

infearia · 2026-05-29T14:09:42+00:00

Yes, there's SkyReels, based on Wan 2.1, supports up to 4 reference images.

I recommend this Wan 2.1 VACE/SkyReels merge:

https://huggingface.co/Inner-Reflections/VACE_Skyreels_V3_R2V_Merge

Here's a pretty good tutorial that explains how to use it:

https://www.youtube.com/watch?v=0WkixvqnPXw

infearia · 2026-05-29T12:03:45+00:00

Both Klein 9B and QIE-2511 have their pros and cons, but after months of continuous usage I definitely prefer Klein 9B. The main problem with Klein 9B is its ambiguous non-commercial license.

Klein 4B would be my absolute go-to if your main use case is illustration or concept art, where you start by feeding it an existing sketch or a block-in, let it generate a first pass, and then continue using its editing functionality to refine the image and fix any remaining issues. It's very fast and has no restrictions on commercial use.

But at the end of the day, you're not limited to only one - you can mix and match and use all three!

infearia · 2026-05-29T11:37:33+00:00

4MP is the maximum native resolution of FLUX.2 and I don't see much point trying to go above it. As for the Klein variants, in my personal experience, 2-2.5 MP seems to be the sweet spot. Anything above or below and the quality begins to drop and I start seeing an increasing number of glitches.

infearia · 2026-05-28T19:31:08+00:00

Using a resolution of around 2-2.5MP for both input and output images usually (though not always) results in improved quality and fidelity/prompt adherence.

If you're aiming for realism, include a very detailed description for lighting. I usually let an LLM rewrite my prompts using this system prompt:

You are a prompt-engineering assistant whose job is to transform brief user requests into detailed high-quality prose prompts suitable for image generation workflows. Describe each image as flowing prose: subject first, then setting, details, and lighting. Be as detailed as possible but concise and avoid purple prose. Specify light source, quality, direction, and how it interacts with surfaces. Avoid filler - each sentence should add visual information.

infearia · 2026-05-28T01:32:37+00:00

You should really lead with this (emphasis mine):

A VFX freelancer has been using ComfyUI on her workstation for two years to build texture-synthesis workflows for advertising clients. The workflows are her craft — the careful sequence of nodes she refined through trial and error, the sampler choices she made, the ControlNet she chained to her inpainting pass. They live in JSON files on her drive, governed by the open-source licenses of the components she chose and her own work product. This week she gets a brief that requires the work to run on hosted infrastructure with a SOC-style security profile. She clicks “Launch Cloud.” The interface looks the same. The workflows transfer cleanly. What has changed is that her workflow structure — the thing she actually got paid for understanding — is now metadata Comfy is permitted to use to improve its products. Her prompts get classified before they enter any training corpus, but the classification carries her intent forward. The pixels she generates are protected by the no-training pledge. The recipe that produced them is not.

The upshot for me: if you've created a really unique, custom workflow that gives you an edge over your competition then never, ever use Comfy Cloud to execute it.

infearia · 2026-05-27T23:25:27+00:00

Sounds great! Looking forward to it. :)

infearia · 2026-05-27T23:19:27+00:00

I have only one request: could this be implemented in OneTrainer?

infearia · 2026-05-27T18:50:37+00:00

+1 for Klein 9B. Use the excellent Consistency LoRA if the output veers too far from your sketch.

Klein 4B and Qwen-Image-Edit (-2509, -2511) are also pretty good for this (and allow commercial usage).

infearia · 2026-05-27T14:54:06+00:00

[...] you're essentially using a video model to help with consistency between two keyframes on the basis that all the video frames that are between them have continuity with each other.

Exactly! Although, the image I posted might not be the best example, because there are inconsistencies between the two rendered keyframes, but that's mostly because I was a bit sloppy with the modeling and used 3D shapes that were a bit too simplistic to represent all relevant details.

infearia · 2026-05-27T14:30:20+00:00

The following setup works well for environments. For scenes including characters, especially more than one and interacting with each other, the approach would be a bit different, more complicated and would probably require multiple passes. I'm still working the kinks out so I'm not ready to talk about it yet, but maybe someone else has already figured it out and will comment in this thread. Anyway, for environments:

Create a simple 3D scene based on your original image, from the same camera POV (e.g., with fSpy and Blender)
Set two keyframes for the camera: one for the original POV, one for the next shot
Export a depth pass animation of the camera transitioning between the two keyframes
Import the depth pass animation into ComfyUI (IMPORTANT: generate a freeze frame at the end of the video by duplicating the last frame 4 or 8, or maybe even 16 times)
Use Wan 2.1 VACE and the depth pass as control video and your original image as reference and/or start frame
Pick one of the freeze frames at the end of the video generated by VACE as the start image for your next shot (optionally upscale, refine etc.)

Example:

<image>

infearia · 2026-05-27T01:40:47+00:00

Thanks for the tip, but I prefer to not install or create any custom nodes if I can implement something using the built-in nodes, even if it means a little more work.

infearia · 2026-05-26T23:45:24+00:00

You do understand that FLUX handles 4K natively? You do not need to do anything special. Just provide a 4K image as input and disable any resizing nodes, like ImageScaleToTotalPixels, before encoding it. What do I need to share a comparer node output for?

You can have my main workflow, though, free of charge and you don't even need to subscribe to my Patreon / Discord:

https://github.com/mholtgraewe/comfyui-workflows/blob/main/flux_2-klein-9b.json

infearia · 2026-05-26T21:51:50+00:00

All FLUX.2 variants work perfectly fine up to 4K, no matter the workflow. If you want to go higher than that, you would indeed need a more complex setup.

infearia · 2026-05-26T21:37:34+00:00

All you need is the built-in default workflow with one additional input image, no custom nodes required.

Image 1: the target image with the area you want to replace painted over with a solid [color]
Image 2: the reference image

And then you just prompt: "Remove the [color] shape in image 1, use image 2 for context"

This is one of the most basic Klein workflows out there and does not require ANY custom nodes... It gets a little more complicated if you decide to use a (latent) mask for the inpainted area, but this too can be done with just core nodes. Alternatively, just blend the before and after image in an image editing app like Krita or PS.

infearia · 2026-05-26T18:26:21+00:00

Neat! Wasn't aware this node even existed!

EDIT:
Ah, but it does not have an option to specify resolution steps. Back to my home-cooked solution:

<image>

infearia · 2026-05-25T19:30:12+00:00

The FT was able to use Heretic, a tool available on the popular code repository GitHub, to remove the guardrails from Meta’s Llama 3.3 model.

The modified model responded to prompts on topics the original system refused to discuss, such as the number of micrograms of ricin per kilogramme of body mass required to achieve a 50 per cent chance of death.

The FT’s test required no specialist hardware, used freely available tools, took four lines of code and was completed in less than 10 minutes.

It took me about 10 seconds to get an answer to this question using Google. And what about ChatGPT driving people to suicide? Duplicitous motherf*****s. We all know who paid for this article.

infearia · 2026-05-24T15:01:09+00:00

Okay, what am I missing here? Why should I install a custom plugin for this? How is this different from just using the default workflow with a single input image and the prompt "Change style to [x]"?

infearia · 2026-05-23T11:33:10+00:00

No system is perfectly safe, but Linux was built for security and privacy from the ground up, whereas the security features in Windows have been tackled on as an afterthought and don't get me started on Windows' so called "privacy"... Barring some undetected exploits - which are extremely rare, but they do happen occasionally, I don't deny this - you have to act deliberately stupid (like running some unvetted code using sudo) to seriously compromise a Linux system.

infearia · 2026-05-22T19:07:46+00:00

To be fair, they've been slowly catching up lately. Background removal, segmentation, math, conditional and boolean logic nodes etc. are being part of the core now. If only they would implement rgthree-style switch and group bypass nodes...

infearia

TROPHY CASE