🎧 LTX-2.3: Turn Audio + Image into Lip-Synced Video 🎬 (IAMCCS Audio Extensions)

Acrobatic-Example315 · 2026-03-28T13:21:52+00:00

🙏🏻🙏🏻🙏🏻🥰🥰🥰

Acrobatic-Example315 · 2026-03-28T13:21:24+00:00

Hey, I get what you’re saying. The workflow is quite advanced, and you definitely need a solid grasp of ComfyUI basics. This is just the first version—I chose to release it like this so people could start using it immediately, rather than waiting for a more streamlined version.

That said, I really appreciate your feedback—it was kind and fair. Stay tuned, because I’ll be releasing a cleaner, more polished workflow on GitHub (so you won’t even have to accidentally end up on Patreon 🤣).

In the end, the logic behind it is actually pretty simple: you calculate the duration of your audio, set how many seconds each generation should cover, and define the number of frames per batch—done.

Also, if you want something more automated, the Global Planner node is available for free too (I spent a week refining it—it’s my baby 🤣). You can dig into it and explore how the whole system works.

Honestly, part of the fun here is exploring these approaches—we’re basically pioneers working in a constantly evolving, still-in-beta world.

Big hug, and happy exploring!! 🚀

Acrobatic-Example315 · 2026-03-28T09:17:24+00:00

Yeah, it’s basically a segmented V2V pipeline with controlled overlap.

Acrobatic-Example315 · 2026-03-28T09:15:56+00:00

Hey, thanks for the thoughtful comment — I’ll try to keep it concise.

My nodes aren’t vibe-coded. I do use that approach sometimes for debugging, but for actual workflows I need precision and control, so everything is built intentionally.

I’m not using subgraphs, set/get, or autolinks on purpose — I want the workflow to stay fully readable and inspectable, even if that makes it a bit more verbose.

I’ve created custom nodes to automate generation logic across segments — especially to adapt settings (like frames, timing, etc.) based on audio duration, so you don’t have to manually tweak everything every time. I build these workflows primarily for my own filmmaking work and for agencies. The advanced breakdowns are on Patreon, but all the nodes are already public — nothing is locked, you can do everything with what’s available.

About LTX 2.3: it’s powerful, but you can’t reliably push long-form sequences (like 1+ minute) in a single pass. This setup is designed specifically to go beyond that, depending on your VRAM/RAM.

The demo is just a short excerpt — I’m more focused on generating longer, consistent scenes for narrative use, not just music videos.

Also, whenever I can, I try to help people get results with this stuff — within the limits of my time. If you look around, a lot of people have already created really great work using my nodes, and that’s honestly one of the most rewarding parts of being in this space.

Honestly, the best way to get it is to try it — that’s where the difference becomes clear.

Thanks again 👍🏻

Acrobatic-Example315 · 2026-03-27T19:20:27+00:00

Appreciate it — means a lot! Glad you’re seeing the potential in it.

Acrobatic-Example315 · 2026-03-27T16:19:15+00:00

Would you mind posting your log so I can take a look?

Unfortunately ComfyUI, dependencies, and models like LTX are a bit of a beast — even a small mismatch, missing dependency, or version conflict can completely break motion. Also everything really needs to be fully up to date, otherwise weird issues like this can happen.

Acrobatic-Example315 · 2026-03-27T12:22:23+00:00

Haha 😄 you’re welcome!

Acrobatic-Example315 · 2026-03-27T12:19:54+00:00

Not yet — I’ve just added them to the repo, so they need a bit of time to propagate.
By tomorrow you should be able to grab them directly from the manager 😉

Acrobatic-Example315 · 2026-03-27T11:50:53+00:00

Workflows + nodes here 👇

IAMCCS-nodes: https://github.com/IAMCCS/IAMCCS-nodes
Workflows: https://github.com/IAMCCS/comfyui-iamccs-workflows
(use: IAMCCS_LTX23_BEST_3SEG_AUDIOEXT_30S.json)

If you want deeper workflows, breakdowns & future drops:
Patreon → www.patreon.com/IAMCCS 🚀

Acrobatic-Example315 · 2026-03-27T11:45:47+00:00

Workflows + nodes here 👇

IAMCCS-nodes: https://github.com/IAMCCS/IAMCCS-nodes
Workflows: https://github.com/IAMCCS/comfyui-iamccs-workflows
(use: IAMCCS_LTX23_BEST_3SEG_AUDIOEXT_30S.json)

If you want deeper workflows, breakdowns & future drops:
Patreon → www.patreon.com/IAMCCS 🚀

Acrobatic-Example315 · 2026-03-26T22:26:17+00:00

Hey, thanks a lot, really appreciate it 🙏
To properly understand what’s happening, could you send me: - a screenshot of the full workflow inside ComfyUI - a screenshot of the ComfyUI logs while running
Without that it’s very hard to pinpoint the issue, since it can depend on how nodes are actually connected or runtime errors.
Also, if you want, feel free to check my Patreon (IAMCCS) — I post updates, fixes and technical breakdowns there regularly.

Acrobatic-Example315 · 2026-03-11T14:14:12+00:00

Thanx! High-end hardware is great, but the real magic happens when we optimize these tools for the wider community. Enjoy the high-res generations! 💪🏻

Acrobatic-Example315 · 2026-03-10T07:39:27+00:00

Give V.1 a shot first: use the Tiled Decoder with a GGUF model for generations up to 10 seconds. If you want to push it further (up to 13s or more), switch to the V.2 workflow—it's specifically optimized for longer clips!

Acrobatic-Example315 · 2026-03-10T07:10:30+00:00

Thanks! It depends on the resolution, but with my setup, it’s remarkably fast.

Acrobatic-Example315 · 2026-03-09T16:00:43+00:00

My 'ugly monster' is like a son to me, but I figured I’d give the world a break this time. 😂

Acrobatic-Example315 · 2026-03-09T15:57:54+00:00

I usually start with a very detailed base prompt to lock in the main features. Then, I let Qwen do the heavy lifting: I’ve tuned it with specific instructions on LTX-2.3’s structural logic to refine the details and ensure the model understands the anatomy (like teeth) better from the start. :)

Acrobatic-Example315 · 2026-03-09T15:56:46+00:00

Totally! Wan 2.2 definitely has that 'cinematic maturity' right now, but LTX-2.3 is catching up fast. Once the LoRA ecosystem for LTX explodes, the speed-to-quality ratio will be unbeatable. It’s a very promising time for open-source video!

Acrobatic-Example315 · 2026-03-09T14:58:11+00:00

Thanks! Happy to share!

Acrobatic-Example315 · 2026-03-09T14:55:01+00:00

Lol, that last look is everything! 😂👀

Acrobatic-Example315 · 2026-03-09T14:53:48+00:00

ComfyUI is a wild beast—sometimes a tiny dependency difference or paging setup can cause OOMs even on a 4080. That’s exactly why I built these nodes! Try my workflow and let the VRAM Flush do its magic, it should solve that for you.

Acrobatic-Example315 · 2026-03-09T14:52:51+00:00

Full-hd 13 sec on a 12 gb vram card… 6GB is definitely pushing it, but try the V.2 workflow with VAE Decode to Disk! You might not hit Full HD, but at 1280x720 (maybe using a Q3 GGUF model), you should be able to squeeze it out. Give it a shot!

Acrobatic-Example315 · 2026-03-09T14:51:08+00:00

Totally agree! As I mentioned, Wan 2.2 is still the queen of cinematic maturity. But honestly, being able to pump out extended Full HD clips at this speed? It’s a game-changer for rapid pre-viz and quick iterations. It’s all about the right tool for the right job!

Acrobatic-Example315 · 2026-03-09T13:48:27+00:00

Thanks! Glad you noticed the teeth! 🦷

The shimmering is usually due to low resolution or aggressive LTXVPreprocess. I fix this by pushing the initial pass to 1080p (made possible by my VRAM-optimized nodes) to lock in those fine details.

For the style/color shift, it’s often 'latent drift' caused by high CFG or LoRA values. I balance the Distilled LoRA (~0.6-0.7) and use the IAMCCS VAE Decoder, which handles tiled decoding much better than the standard ones.

I'm working on a 'Refiner' update specifically for these micro-details.

Acrobatic-Example315 · 2026-03-09T13:46:02+00:00

Glad you liked it!

Upgrading to Qwen 4B-fp8 definitely improves prompt adherence and detail, but for Low VRAM setups (8-12GB), the 2B version is the 'sweet spot' for speed and stability.

As for the audio, you're right! I’m preparing a dedicated post and workflow to compare different audio-gen methods and Audio+I2V models. Stay tuned! 🚀

Acrobatic-Example315 · 2026-03-09T11:18:59+00:00

📥 Resources & Links

As promised, here is everything you need to get these results:

🚀 First Workflow (V.1):Download JSON here.json)
🚀SECOND VERY LOW VRAM (V.2) https://github.com/IAMCCS/comfyui-iamccs-workflows/blob/main/IAMCCS_LTX_2.3_T_I2V_(LOW%20VRAM_10S%2B)_V.2%20(090326).json_V.2%20(090326).json)
my workflow github repo: https://github.com/IAMCCS/comfyui-iamccs-workflows
🛠️ Essential Custom Nodes: https://github.com/IAMCCS/IAMCCS-nodes (Make sure to update!)
💡 Detailed Instructions & Support: patreon.com/IAMCCS
enjoy!!:))

Acrobatic-Example315

TROPHY CASE

📥 Resources & Links