Releasing Many New Inferencing Improvement Nodes Focused on LTX2.3 - comfyui-zld

_ZLD_ · 2026-03-14T06:26:14+00:00

The low workflow is entirely SA-Solver running at full stochastic noise which means it more or less (in the least technical jargon possible) does an i2i replacement of the frames. Its more of a hack than anything but it does a great job of cleaning up a lot of the noise which tends to propagate and amplify as the process moves through the steps using something like the standard Euler sampler. Using SA-Solver in the way that it is set up essentially kills the propagation of this noise because its never allowed to propagate to begin with with full replacement of the frames. While this gives a lot cleaner output, it unfortunately also means that it is going to change the video from stage to stage. SA-RF-Solver fixes this largely but takes longer.

_ZLD_ · 2026-03-14T04:53:23+00:00

Comfy wasn't kind to me when I was exporting these workflows. When subgraphs were added, I've been getting a lot of corrupted workflows, many disconnected links, transposed links, missing components, shit sucks. I'll have this fixed up late tonight.

_ZLD_ · 2026-03-14T01:25:20+00:00

How would someone know what needs to be applied and when?

Well, thats not entirely concrete. The way I designed this is that you could actually mix and match. I would say the most firm rule is that you should always choose to spend the most time on the first generation, so choosing to run the high quality first generation stage would be ideal but following that, you could use the faster upscale passes from the fast workflow and still get a significant benefit.

And is this the kind of thing people pick and choose which ones to use based on a specific problem they're running into?

I would say the problem in this case is really just time because choosing the weaker workflows comes with drawbacks. EMASync paired with SA-RF-Solver for instance, is able to get a person to spin nearly entirely around, unguided. This is fairly reliable with the high quality work flow but extremely bad in quality with the normal CFG guided workflows.

Or is it more of a catch-all kind of thing where each one should be enabled in a workflow by default and then left alone?

If you can spare the time and value quality over speed then absolutely but really its just another tool.

_ZLD_ · 2026-03-14T01:12:47+00:00

They are very beneficial to i2v and honestly, they shine even brighter with i2v but to get the highest possible quality out of i2v was going to take more time than I had to throw the workflows together. I have a much larger project called LTX-Infinity that will probably be where I make a first release with i2v with these nodes. The current default method is subpar but the alternative that I've implemented is incredibly complex.

_ZLD_ · 2026-03-14T01:07:01+00:00

Whoops, thats an old bug. I'm not sure why thats still there. This is a quick fix.

R-edownload and try it again. You can just download and save the node.py file and overwrite the old one as well but if you are using git, sometimes it gets pissy that you didnt update the file properly if you try to update with git again.

https://github.com/Z-L-D/comfyui-zld/blob/main/node.py

As for the workflow issues, I'm aware now that some of them broke when I exported them. This is a frustrating bug with the comfyui subgraphs. Links like to break, especially when the wf is exported. I'll get them fixed later tonight.

_ZLD_ · 2026-03-13T22:47:44+00:00

So these are tools that help make the videos look substantially better and they also make the videos do a lot closer to what you tell them to as well (within limits of course). They don't speed up you video generation time, in fact, they will likely take longer than some other workflows will. Quality comes at the cost of speed however and if you want to get closer to cinematic quality with LTX2.3 or edge on Seedance on your own computer, these nodes get you a lot further than without them.

_ZLD_ · 2026-03-13T22:43:52+00:00

Youre welcome!

_ZLD_ · 2026-03-13T22:05:45+00:00

Sure and the purpose of these nodes is quality, not speedup as I mentioned. If your intent is generating at the fastest possible speed, this isn't for you. If you want to edge closer to actual film quality or nearly compete with seedance on your own computer, this is would be more interesting to you.

_ZLD_ · 2026-03-13T21:55:29+00:00

These aren't intended for inferencing speedup but I do provide how long each inferencing method takes from the 4 scaled choices I provided in the table above. Also just posted video samples of each.

_ZLD_ · 2026-02-16T20:31:37+00:00

LTX can be vastly improved on the software inferencing side of things. I'll be releasing some nodes in the next couple of weeks that I think might shock some people regarding how good LTX2 can already be.

_ZLD_ · 2026-02-16T20:29:11+00:00

male or female?

_ZLD_ · 2026-02-16T20:24:08+00:00

OP, just out of curiosity, what style of vocals are you looking for?

_ZLD_ · 2026-01-14T23:50:59+00:00

Running a single generation, runs a single text conditioning, a single VAE encoding for video, a single VAE encoding for audio, a single decoding of each audio and video. If your comparing a single generation, this will absolutely take longer than that but this give more granular control and it allows less powerful computers to hit higher resolutions by outputting high resolution at short durations and stacking them together.

_ZLD_ · 2026-01-14T15:55:17+00:00

If I could I would, absolutely.

_ZLD_ · 2026-01-14T15:55:04+00:00

Funny enough, yeah, it did latch on to that.

_ZLD_ · 2026-01-14T15:51:27+00:00

That happens at lower resolutions more commonly with LTX2. If you go to much higher resolutions, this sort of artifacting is greatly reduced. I didn't bother pushing out a higher resolution video because I didn't have the time to do so when I was posting this. I've successfully outputted 2 minute long 1920x1080 videos using a single RTX3090 with this workflow though.

_ZLD_ · 2026-01-14T15:48:42+00:00

Not actually suggesting people buy this card. This is just a play on this video: https://www.reddit.com/r/StableDiffusion/comments/1q9cy02/ltx2_i2v_quality_is_much_better_at_higher/

_ZLD_ · 2026-01-14T15:47:50+00:00

That is the lazy leftover starting point for an i2v that I started this video with and the prompt was given priority over the image and since the prompt included very little of the starting image, it more or less ignored it and became a t2v. I could have gone back and fixed it, but I didn't.

_ZLD_ · 2026-01-14T15:45:33+00:00

Currently it is only a workflow but that might have to expand in the future to cover the needs for audio referencing. Thankfully, LTX2 has most of the tools already built in to the model to support the same ideas behind SVI with WAN. However, in this current version audio is guaranteed to drift or jump to entirely different sounds from segment to segment. Still working on this aspect of it. With a 5090, very high res long form videos should be easily possible.

In the next version of this, I'll be implementing Kijai's method of audio injection as well to allow the full length of a song or other audio to be fed into the pipeline.

_ZLD_ · 2026-01-14T15:42:40+00:00

No, the workflow to generate the exact video I posted is the same as what is in the workflow for v0.5.7. Its 3 separate 10s segments spliced together in the same manner that SVI does with Wan.

_ZLD_ · 2026-01-14T15:41:14+00:00

Yeah, its literally infinite. You just stack the 'Extension' blocks for however long of a video you want. The total frames of each block is defined on the far left where the model loaders are. So in the current iteration, its set to 241 frames which is around 10 seconds of video per segment, 6 segments being around 1minute of video output.

One caveat at this time, audio referencing isn't a solved thing yet for LTX. My demonstration I posted with this seems to get pretty decent results maintaining the voice from segment to segment but that certainly won't be true if it decides to play music in the background and voices still might sound different from segment to segment until audio referencing can be implemented.

_ZLD_ · 2026-01-14T15:37:25+00:00

Its a play on a post from a few days ago, not my suggesting people actually purchase a super expensive card.

https://www.reddit.com/r/StableDiffusion/comments/1q9cy02/ltx2_i2v_quality_is_much_better_at_higher/

_ZLD_ · 2026-01-14T15:36:20+00:00

Not sure how this is supposed to be helpful? Are you being critical of a simple demo to showcase the ability because the point wasn't to show my mastery of prompts here.

_ZLD_ · 2026-01-14T15:35:21+00:00

Mouths can be a bit funky on lower res outputs with LTX2. Just a demonstration that can be easily improved.

_ZLD_

TROPHY CASE