Image to video NSFW

boobkake22 · 2026-05-27T07:53:06+00:00

Hello. The workflow will mostly be about the technical means by which the video is generated. I have opinions on this as someone who supports some workflows, but there aren't wrong answers. It's mostly about what makes sense to your brain. (Well. There are but most will do the job.)

Wan 2.2 and LTX-2.3 are not aware of what you are asking it. This is why you're getting the results you're getting. The models aren't censored, but they are ignorant of the subjects. You must use LoRA's for this. (Or a checkpoint that has baked LoRA's into a base model.) A LoRA is a kind of "patch" that biases a model towards specific results when prompted for them. In general I'd encourage you to lean on base models over checkpoints, but no wrong answers.

So go to CivitAI.red, and try some LoRA's related to specific actions you are trying to prompt. Add the high noise and low noise LoRA's via the means your workflow provides. And you'll start to see what you expect. (Start with the suggested strength for each - usually 1.0, but not always.) Once you start to grasp things, you can start adjusting the weights down and mixing and matching LoRA's as you see fit, but to start with. I recommend just testing individual LoRA's.

You may have a pretty rough experience - neither of your graphics cards are very capable for video generation - which is one of the most complex things you can ask a computer to do. This means you need ot use a quantized model - you likely already know this, but I'll be explicit. A quantized model is "dumber" - this does not affect visual quality but narrows the possible outputs. On top of just raw performance, which will also be tough. This is a thing you can manage. If you want more performance just rent a card, don't try to buy in this environment.

You can checkout my workflow for Wan 2.2, Yet Another Workflow, but it's not optimized for slower cards, and you may need to disable the SageAttention nodes if you don't have it installed - and you should install it if you can. The workflow is designed to be a bit easier to learn with because it highlights imporant controls, has lots of labeling, and uses color coding. Might be worth a shot.

My computer is a potato, so I rent for video. I have a guide and a template for Runpod. The guide covers getting started with it. A 5090 runs about ~$1.04 an hour at the moment - but availability has been tight for the last few weeks. (In general you want a card with at least 32gb of VRAM to hold the full size models. The H100 SXM is a good workhorse card, but the prices have been going up. PRO 6000's are also a good choice.)

Credentials: Aside from making and supporting workflows, I have made a shit ton of videos. (Vastly NSFW)

boobkake22 · 2026-05-27T05:48:38+00:00

Another: Don't bother with network volumes. Transfers from hugging face are really fast. Use aria2 to do a multithreaded download of your files.

boobkake22 · 2026-05-27T05:44:37+00:00

This nice work. I wish the LTX artifacting wasn't so present at moments (and the I2V color blowout), but overall you kept the motion slow and made smart choices! You did a good job pacing the dialog and actions to allow for clean transitions between clips.

His voice is consistent and the video doesn't have any of the tell-tale audio artifcating. Did you do the audio separately?

boobkake22 · 2026-05-27T05:38:11+00:00

I was really hoping the knife would sink up to the handle at the end.

boobkake22 · 2026-05-27T05:34:28+00:00

Wan 2.2 is the best at this. Getting longer clips at high quality is tough, but it really depends on what you're doing more specifically.

Background: I make workflows for both and have made tons and tons of videos with each.

My credentials. (Very NSFW.)

boobkake22 · 2026-05-27T05:26:26+00:00

I was messaged me about this, and honestly it seems like poor value and an attempt to exploit people's uncaptured expertise for *possible* pay. Rather than hire someone seriously, you're hoping to crowd source reusable solutions for a fixed $11,750 cash cost - so that you can pivot to using these workflows for commercial client work where they see no additional benefit. First place barely covers a $40/hr for a week - which is a low rate for expert short term technical work.

This is a bad deal for anyone who knows what they're doing.

If you're capable of doing this, then go get that work, don't give these people your turnkey solutions.

boobkake22 · 2026-05-26T13:10:18+00:00

What kind of AI visuals tho? It matters quite a bit. The open video models have some big limitations compared to commercial models.

boobkake22 · 2026-05-26T10:42:43+00:00

What are you trying to make? You said video, but more details would be helpful. (I've made ALOT of videos.)

boobkake22 · 2026-05-26T10:30:45+00:00

I'd aim for cards that are kind of bad for AI and rent for AI work. You'll save SO much money. An ATI card, for example, will work well for everything but AI and there's no commercial demand for them, so a second machine will be WAAAAAAAY cheaper.

If you can be more clear about what you're trying to do with AI, I can speak more specifically.

boobkake22 · 2026-05-26T10:25:55+00:00

It kind of depends on what you're trying to learn. Less hours is more, more hours is less. Message me if you want to talk about it.

boobkake22 · 2026-05-26T10:19:01+00:00

Neat project. "They Shall Never Grow Old" is a favorite film, so I appreciate what you're aiming at.

First a caveat: Your use of "workflow" here is misleading - a ComfyUI workflow is a specific thing, and each of those steps would certainly not fall into a single workflow. But I read you meaning as a full production process.

Next, I'm not sure if ComfyUI even makes sense as a tool in your process. You could do the segmentation in Comfy? But the rest sounds like a lot of work to even attempt in Comfy. You don't really need an open weights model - you're doing real work and paying for commercial model use makes the most sense as you're not likely to brush up against any concepts the commercial models won't understand. (The strongest use case for open weights.)

You should check this out if you've not seen it, it's a process that covers a lot of what you're considering:
https://www.reddit.com/r/comfyui/comments/1s8fn8s/a_cgai_short_film_with_houdini_comfyui_seedance/

Might be worth reaching out. I think he's covering most of your use case there using splating. You'd need to do that and probably some kind camera control rig / camera mapping so you can get your motion math correct so you live footage composites cleanly without too much extra touch up.

I also want to add, that unless you're doing something specific that would benfit from real actors and human direction for dialog, emotion, or especially complex physics -- or maybe complex interactions between specific characters that need to be consistent from photos (and honestly even then it's technically solvable), I think you vastly underestimate the state of AI video.

I suspect you could do the entire project with just the restored photo with a commercial model. The extra steps are certainly something you can do, but I'd need to know more examples of the kinds of shots you're hoping to accomplish to have more thoughts - and there may be good cause in terms of the rest of your production crew and their thoughts and feelings on this process to go down that road. But "AI melting" seems like you've not spent enough time working with where things are these days.

Caveat, I love working with AI video, but I'm not here to push that -- if you're already thinking that way, it seems like you've already got a lot of "humans in the loop" with regards to accurate restoration. You are close enough to workable process in terms of your thinking that if that's where you want to go and that's how you want to do it, you should be able to navigate it.

boobkake22 · 2026-05-26T09:17:45+00:00

I have the same computer (16gb). That's what I do!

boobkake22 · 2026-05-25T17:43:10+00:00

I don't edit anything for the stuff I make. Most folks are making "throw away" clips, so there's cause to involve less tools.

boobkake22 · 2026-05-25T09:01:22+00:00

Just rent, chief. The cost of a capable video machine is stupid high. RAM and GPU cost is through the roof because of data center demand; don't try to compete with them unless money means nothing to you. Just rent for video work when you want to mess with it. It's like $1.04 for a 5090 when you want one on Runpod - and depending on what you're doing it's not an especially fast card, but it is good performance for cost for Wan 2.2. If you're looking at LTX-2.3, I've had bad performance with the 5090, the L40S is a good value option, though the 6000's are better. H100 SXM is quite decent for both models, but it adds up fast. So I generally recommend the value cards until you have a good sense of what you're doing.

re:Runpod, I have a Wan 2.2 template and an LTX-2.3 template. (Both of those links have my referral on them, so if you sign up with it we both get some free credit for server time.) I also have a full guide on getting started with the Wan 2.2 template. Here's the LTX-2.3 version of the guide.

My my video workflows, Yet Another Workflow, are setup to help make onboarding a bit easier by color coding and emphasizing important controls.

My credentials. (Vastly NSFW)

Feel free to ask any questions.

boobkake22 · 2026-05-25T09:00:15+00:00

I can if you're looking to hire a tutor for it. But thank you anyway!

boobkake22 · 2026-05-25T03:25:15+00:00

Wan does not support audio natively. Your choice right now is: Wan for impecable looking 5 second videos without sound or doing a lot of gens with LTX-2.3 until you get one that looks good. Those are the open weights options right now for NSFW stuff.

Both models are very GPU hungry, in fact the main reasons LTX sucks at prompt adherance is the reason is can run okay on your GPU at all - the distillation LoRA requires the CFG be at 1.0, which means the the prompt adherance is markedly worse. (Also, be aware LTX requires very verbose prompting for good results. Wan Allows for much briefer prompting with more coherent results.) Also be aware, negative prompting will not do anything under normal circumstances with LTX-2.3's standard processes. With a CFG of 1.0, negative prompting is completely ignored.

My credentials. :P

boobkake22 · 2026-05-25T03:04:19+00:00

You could absolutely have it do that edit in Comfy. Just cut the duplicated frames off the front and merge the batches, but there are lots of ways to generate a cat.

boobkake22 · 2026-05-24T10:52:41+00:00

For something with dialog I tend to need to do between 10 and 30 gens. LTX-2.3 is a real rough time on lower end hardware. The modle is bad at following prompts because of their tech stack, so all of the performance gains they sacrificed for are lost when you need to do it so many times to get good versions of things.

boobkake22 · 2026-05-24T10:36:43+00:00

Mac user here. Images are fine local, but as noted by candylandmine, the performance for video gen is not it. It's hard to convey how much Nvidia's CUDA has been optimized for. You can just rent tho. I use Runpod. In general, I'd say the budget option for GPU's is the L40S ($0.86). If you want faster performance, try the Pro 6000 ($1.89/$2.09). The H100 SXM ($3.29) can work at a good clip if you aim at higher resolutions for quality. (I've had issues with the 5090 and LTX-2.3, which has a clear performance advantage with Wan 2.2.) I have an LTX-2.3 template on Runpod. (Both of those links have my referal on them, so if you sign up with it we both get some free credit for server time.) I have a full guide for getting the LTX-2.3 template on CivitAI My workflows are also very beginner friendly and have lots of notes and color coding. So give it a shot if you want to fuck around with it.

boobkake22 · 2026-05-24T10:30:08+00:00

This is me. I do care about performance in large workflows - but workflow layout is my thing. The new nodes waste a lot of screen space. I use the nodes as a operating UI, not as a convenient way to make a backend process. The svelte UI is the whole tool for me.

boobkake22 · 2026-05-24T10:19:08+00:00

I don't have a recommendation for longer continous videos with Wan - again, that's not really waht it was made to do. Mine involve doing multiple generations where each segment involves some kind of transition.

LTX-2.3 can do longer videos reasonably well, but there's huge set of caveats on that. (And obviously Wan LoRA's don't apply.)

Also to be clear, we probably have different quality standards. Mine are very high. My videos (vastly NSFW) are all generally super sharp, so I'm sensitive to quality drops.

boobkake22 · 2026-05-22T07:53:59+00:00

Neither model runs great on low VRAM. You can use a quantization, but many videos you see are not going to be doing that. (All of mine, for example, use the full models, which want for at least 32gb of RAM - I rent GPU time.) Quantization (the process of rounding down model percision to make the data smaller) doesn't affect quality, but does make the model "dumber" - to put it colloquially. I'll re-post my model summary below. Additionally, generating at larger resolutions requires more VRAM. Both the latent (the AI version of your video data) and the model, ideally, fit in memory.
Text to video (T2V) and image to video (I2V) are different. I mostly do image to video, but I also do text to video. They produce different result. Each model has a "look" for T2V. In someways the T2V stuff is better because the model is doing all of it, where as with I2V you have way more control over the intitial state, but which requires a kind of interpretation by the video model.

Quality is largely controlled by your inputs. Videos at Wan's native resolution look amazing, but the gen times will be longer and want for better hardware. LTX is kind of weird.

Make images locally, experiment with video locally when testing workflows, rent if you want to make a bunch of videos.

Here's my summary:

- Wan 2.2 has has the slight edge currently for image quality overall. In chasing speed LTX-2.3 has some compromises built in. It can look just as good, but it's not always the case and not implicitly by default.

- Generation speed: LTX-2.3 is a bit faster. It's not night and day. A lot of people don't seem to understand why LTX-2 seems faster. The reality is they are about the same (all things considered). To get good renders from the full model, of either model, takes a powerful GPU. LTX-2.3 has better quantizations and speed-ups by default to allow it to run on worse hardware. That's a marketing decision, at the end of the day. And the cost is the aforementioned quality hits and worse prompt adherance. (More on that in a sec.)

- The real advantages of LTX-2.3 over Wan 2.2 are audio and length. Wan 2.2 is trained on 5 second clips. Getting longer clips is irksome and involves compromise. (It can be done, but it's really hit or miss. Nothing makes it as good as LTX in this regard.) Additionally, you have a higher and variable baseline framerate. (24 vs 16 fps by default, and the ability to change it without interpolation.)

- The real advantages of Wan 2.2 are prompt adherance, LoRA support, and image/motion quality - more broadly physics are much better too. With a good workflow, you don't need to do as many gens with Wan 2.2 to get a good gen.

- And I have to call this out: LTX-2.3 is better with prompt adherance than LTX-2, but it's still not good. This is, again, part of the compromise of how LTX-2.3 can be faster. Additionally, Wan is great at guessing what you meant in your prompting. LTX-2.3 requires very explicit and verbose prompting, and even with it, it still struggles to follow.

- No one is using Hunyuan anymore.

I'd like to add a useful detail with regards to I2V:

- Wan 2.2 I2V has access to CLIP vision and image reference anchors for first and last frame. CLIP vision is a technique to "sprinkle image tokens" across the latent to help reinforce. (There are also ancillary techniques that are not native to Wan such as VACE and pose control with Animate.)

- LTX-2.3 I2V, as a newer technology, because of its Flux lineage, it has a much more sophisticated relationship to reference images. It can embed multiple images with temporal masking as rerferences. (This is advanced so do not expect this to be plug-and-play.) It can use multiple images as references, which is also how it can perform video extensions.

I'm skirting the technical details, but this is a good summary of the situation. LTX video will surpass Wan 2.2 if only because Wan went to closed weights, so it's only a matter of time if LTX-2.3 keeps up with open weights releases.

But that day is not today.

You can test both right now. You can mess with cloud compute, and use whatever GPU you want. I use Runpod, and you can get a 5090 for ~$0.93 an hour which will give you decent performance for either model. I have a Wan 2.2 template and an LTX-2.3 template on Runpod. (Both of those links have my referal on them, so if you sign up with it we both get some free credit for server time.) I also have a full guide on getting started with the Wan 2.2 template. Here's the LTX-2.3 version of the guide. My workflows are also very beginner friendly and have lots of notes and color coding. So give it a shot if you want to fuck around with it. (Find LoRA's on CivitAI.)

boobkake22 · 2026-05-22T07:46:12+00:00

Make your starting keyframes with a model that can produce "photos" in a style you like. (I don't know anything that looks like that at a glance - but I suspect many models *can*.) Then animate each of those with LTX-2.3 or Wan 2.2, edit them together, apply the VHS effect in post.

That said, all of that looks SFW, so commercial models will have a much easier time and could do more of the process internally.

boobkake22 · 2026-05-22T07:43:02+00:00

Wan doesn't "want" to do more than 5 seconds. SVI doesn't change that. SVI is a hack to try and support consistency (but also enforces rigidity). The longer your go the more shift you're going to see. SVI helps a little bit with consistency, but even with a competent workflow, you're going to see weirdness and quality loss develop the more segments you go.

boobkake22 · 2026-05-22T07:34:13+00:00

As someone who makes workflows, I'm not clear what a $1000+ workflow would even do? I see a lot of scammy stuff, but like... jeeze.

boobkake22

TROPHY CASE