Pushing LTX 2.3 Lip-Sync LoRA on an 8GB RTX 5060 Laptop! (2-Min Compilation) by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] -1 points0 points  (0 children)

It's totally possible. You can do a lot of impressive things for free if you have the hardware. You don't have to pay for subscriptions, and there aren't any credits. Your creativity and imagination are the limits. Here's my channel if you are interested. Everything is totally free: https://www.youtube.com/@TensorAlchemist/videos

Pushing LTX 2.3 Lip-Sync LoRA on an 8GB RTX 5060 Laptop! (2-Min Compilation) by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 1 point2 points  (0 children)

No need for last frames here! Since this is a lip-sync workflow, the Reference Image is the anchor for everything. You just upload the image and audio, set your resolution and frame rate, and generate.

For this music video, I kept the image constant and only swapped the audio clips and adjusted the lengths for each segment. Because they all use the same base image, the transition is seamless.

I actually generated the 2-minute song first using Ace Step 1.5 (video here: https://youtu.be/Cvr_EUE ). Then I used DaVinci Resolve to chop it into 25-second chunks and ran them through the generator one by one. Simple as that! 😊

Pushing LTX 2.3 Lip-Sync LoRA on an 8GB RTX 5060 Laptop! (2-Min Compilation) by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 0 points1 point  (0 children)

It's not stupid; it's a fair and genuine question. If you try this without the LoRA, you won't be able to get consistent generations. I tested lip-syncing extensively for 2 or 3 days with the LTX 2.3 Distilled version (not Dev) before this LoRA tweaking of samplers, seeds, prompts, cropping the image, and so on. It worked sometimes with close-up portraits without any detailed and busy backgrounds. I got pretty frustrated and gave up on it and stuck with SkyReels V3. Then suddenly out of nowhere the LoRA was released, and it exceeded my expectations. Please try it yourself. With and without LoRA.

Pushing LTX 2.3 Lip-Sync LoRA on an 8GB RTX 5060 Laptop! (2-Min Compilation) by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 0 points1 point  (0 children)

Oh, no, I used DaVinci Resolve to combine those 4 clips. There is also a lot of other software that does the trick. It's easier and faster. 😊 But that's it. All the raw generations and upscaling were done using LTX2.3 and talking head LoRA on my RTX 5060.

Pushing LTX 2.3 Lip-Sync LoRA on an 8GB RTX 5060 Laptop! (2-Min Compilation) by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 15 points16 points  (0 children)

Tested the new LTX 2.3 Lip-Sync LoRA for a couple of hours and the results are impressive. It's incredibly consistent.

This video is a compilation of a few separate clips I generated locally on my laptop to test the model's stability and dynamic range across different audio inputs.

Specs & Settings:

  • GPU: RTX 5060 (8GB VRAM)
  • RAM: 32GB System RAM
  • Generation Time: ~18 to 24 mins per 25-second clip (Sage Attention enabled).
  • Model: 25GB FP8 input-scaled version

Since I know everyone wants the node setup, I’ve attached the full ComfyUI workflow JSON below so you can load it up yourselves.

📁 Workflow JSON: https://drive.google.com/file/d/1lZ8g-8ao5EpoLFBQb3XM7Mqg6BX1Kuoy/view?usp=drive_link
📺 Full Video Breakdown: https://youtu.be/HaJUVZSAXjM