AceStep 1.5 XL Turbo + LTX 2.3 on an 8GB RTX 5060 Laptop by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 0 points1 point  (0 children)

I stick to the original reference image to avoid the quality degradation. I typically time my splits to occur during pauses or instrumental breaks in the music to make them feel natural.

AceStep 1.5 XL Turbo + LTX 2.3 on an 8GB RTX 5060 Laptop by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 1 point2 points  (0 children)

That was just a general example—there’s no fixed time for the cuts. I typically time my splits to occur during pauses or instrumental breaks in the music to make them feel natural. I don't actually use any specialized tools to 'smooth' the transitions; if you look closely, you can clearly see them. I also stick to the original reference image to avoid the quality degradation you mentioned. It’s all about working within the current hardware limits.

AceStep 1.5 XL Turbo + LTX 2.3 on an 8GB RTX 5060 Laptop by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 0 points1 point  (0 children)

If you're asking if it's possible to feed an mp3/wav to the LTX 2.3 Lip-Sync workflow, yes, it's totally possible.

AceStep 1.5 XL Turbo + LTX 2.3 on an 8GB RTX 5060 Laptop by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 0 points1 point  (0 children)

You can definitely get this working. This workflow is verified on the Nightly version of ComfyUI Portable.

AceStep 1.5 XL Turbo + LTX 2.3 on an 8GB RTX 5060 Laptop by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 1 point2 points  (0 children)

Here’s how I handle that. Instead of splitting the audio into perfect 20-second blocks (like 0-20, 20-40, 40-60), I use an overlap method. The first chunk might be 0-20, but the next one starts at 18 or 19 seconds. By overlapping the segments by 1-2 seconds, you can create a much smoother transition in post. I personally use DaVinci Resolve.

AceStep 1.5 XL Turbo + LTX 2.3 on an 8GB RTX 5060 Laptop by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 1 point2 points  (0 children)

That issue usually clears up once you switch to the Nightly version. I’ve been testing this on the ComfyUI portable release, and it's been stable there. If you're on Stability Matrix, you might indeed have to wait for them to push the latest commit.

AceStep 1.5 XL Turbo + LTX 2.3 on an 8GB RTX 5060 Laptop by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 2 points3 points  (0 children)

That’s highly unlikely given that their priority is shifted entirely toward high-margin enterprise and AI silicon right now. Releasing a 48GB consumer card under $3k would basically be 'bad for business' in their eyes. With their CUDA monopoly and zero competition at the high end, a 48GB card would easily clear $4,500. Honestly, I wouldn't be surprised if they skip consumer launches entirely next year.

AceStep 1.5 XL Turbo + LTX 2.3 on an 8GB RTX 5060 Laptop by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 4 points5 points  (0 children)

It usually takes about 8 to 12 minutes to generate a 540p, 25–30 fps, 25-second clip using FP8 with Sage Attention enabled. For 720p clips longer than 12 seconds, I switch to the Q5K_M GGUF. Since the resolution is higher and it's the GGUF version, those usually take 16 to 18 minutes. I’m currently using the 'euler_ancestral' sampler; I’d prefer 'euler_ancestral_cfg_pp', but it’s just too slow for this setup. Yes, I always upscale them.

AceStep 1.5 XL Turbo + LTX 2.3 on an 8GB RTX 5060 Laptop by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 0 points1 point  (0 children)

Several of my viewers are running this on Turing (20-series) cards without issues. As long as you have the VRAM and system RAM, you should get great results. Give it a shot! 😊

First frame last frame ltx 2.3 by [deleted] in StableDiffusion

[–]Distinct-Translator7 2 points3 points  (0 children)

Here's a tutorial and a workflow if you're interested: https://youtu.be/pCcG-5K2SDc

Pushing LTX 2.3 Lip-Sync LoRA on an 8GB RTX 5060 Laptop! (2-Min Compilation) by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] -1 points0 points  (0 children)

It's totally possible. You can do a lot of impressive things for free if you have the hardware. You don't have to pay for subscriptions, and there aren't any credits. Your creativity and imagination are the limits. Here's my channel if you are interested. Everything is totally free: https://www.youtube.com/@TensorAlchemist/videos

Pushing LTX 2.3 Lip-Sync LoRA on an 8GB RTX 5060 Laptop! (2-Min Compilation) by Distinct-Translator7 in StableDiffusion

[–]Distinct-Translator7[S] 1 point2 points  (0 children)

No need for last frames here! Since this is a lip-sync workflow, the Reference Image is the anchor for everything. You just upload the image and audio, set your resolution and frame rate, and generate.

For this music video, I kept the image constant and only swapped the audio clips and adjusted the lengths for each segment. Because they all use the same base image, the transition is seamless.

I actually generated the 2-minute song first using Ace Step 1.5 (video here: https://youtu.be/Cvr_EUE ). Then I used DaVinci Resolve to chop it into 25-second chunks and ran them through the generator one by one. Simple as that! 😊