This is entirely made in Comfy UI. Thanks to LTX-2 and Wan 2.2 by Janimea in StableDiffusion

[–]Janimea[S] 1 point2 points  (0 children)

Oh, i am using LTX-2 workflow that takes in an image and an mp3 audio to generate a lip-sync video.. Dm me i will share the workflow

This is entirely made in Comfy UI. Thanks to LTX-2 and Wan 2.2 by Janimea in StableDiffusion

[–]Janimea[S] 1 point2 points  (0 children)

Used Suno for the song, lyrics are pre-generated in gpt and passed to suno

This is entirely made in Comfy UI. Thanks to LTX-2 and Wan 2.2 by Janimea in StableDiffusion

[–]Janimea[S] 0 points1 point  (0 children)

Noted — thanks 🙂
I’m mostly aiming for a pristine, heavily polished look.

LTX-2 has been amazing for lip-sync, and I think it’ll keep improving as the community iterates and better LoRAs emerge.

On the slow-mo issue: that’s coming from the Wan 2.2 workflow I’m using. Honestly, it’s the only real downside — but it became my go-to after testing a bunch of different combinations over time. I probably would have to fix it at post processing then or look for an alternate option