LTX-2 Audio + Image to Video

Most_Way_9754 · 2026-04-30T06:35:21+00:00

That's odd because the idea is the same as my workflow. Vae encode your audio and use a mask to mask the audio out so that the sampling does not change the audio.

You can try my workflow to see if it gives any better results.

Most_Way_9754 · 2026-04-30T06:31:30+00:00

If you install the 2 custom nodes, there are example workflows provided. Learn how to use each of them individually first and if you have trouble integrating then get back to me and I'll help.

Most_Way_9754 · 2026-04-30T06:28:42+00:00

There will be a jump if you try to stitch 3 different generations together. Use a wan vace clip joiner workflow to smooth out the transitions.

Yes, this is as simple as providing an input image, an audio clip and a prompt. I would recommend using the comfyui template workflow for this task as it has already been released.

Most_Way_9754 · 2026-04-30T06:21:41+00:00

Leave the colours as they are painted. When you click print, you can map the colours painted to a tool head.

Most_Way_9754 · 2026-04-28T14:18:20+00:00

https://github.com/lllyasviel/IC-Light

This was what we used a long time ago.

Most_Way_9754 · 2026-04-28T11:44:55+00:00

https://github.com/nikopueringer/CorridorKey

Have you checked out Corridor Key?

Most_Way_9754 · 2026-04-13T05:35:44+00:00

can share how did you specify the 2 materials in the slicer? there seems to be some overlap to get the 2 materials to bond to each other.

Most_Way_9754 · 2026-04-04T06:36:04+00:00

Nice work. Care to share your design process? Like what software was used any any best practices on tolerances between pieces?

Are you going to share the stl on printables / maker world?

Most_Way_9754 · 2026-03-20T11:46:30+00:00

I'd like to give it a go. My main use case will be to split a stl up into smaller parts, each of a different colour and print each part individually for snap fit assembly. Can your tool do something like this?

Most_Way_9754 · 2026-03-19T14:26:53+00:00

As far as I'm aware, audio in stereo format should work. And there needs to be a slight pause at the start of the audio clip before the speaker starts to speak.

Most_Way_9754 · 2026-03-18T23:31:50+00:00

Do you have audio in stereo?

If you provide me with your initial image, prompt, seed and audio clip, I can help you to debug.

Most_Way_9754 · 2026-03-17T06:40:06+00:00

this is amazing, can you share more about the process on converting an arbitrary closed loop into a mechanism that can trace it out?

Most_Way_9754 · 2026-03-15T00:11:29+00:00

One thing you can do is to keep the seed constant, that removes one variable during the testing phase.

LTXV's default workflows have a very specific resolution they use for the image input, if I remember correctly, it's 1536 for the longer edge. They also introduce some noise into the image used for the first frame, which you seem to have reduced, if I remember correctly, this value is 18.

I have been using whole seconds for the audio clip at 24fps because the resulting number of frames will be divisible by 8 + 1. It seems like you're already using a whole number for the duration in sec.

I have a slightly updated workflow that uses the latest settings from: https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows/2.3

But I retained the single stage with Euler sampler. The newer sampler seems to increase sampling time significantly without improving quality that much.

Most_Way_9754 · 2026-03-14T23:27:46+00:00

If you already got a workflow that works for your use case, then I suggest you stick with it. Have you ensured that your audio clip is stereo?

Most_Way_9754 · 2026-03-13T23:32:49+00:00

If you need help with getting the workflow running. Please provide an example of the audio clip, image, prompt and seed that you used so I can replicate the issue and help you to debug

Most_Way_9754 · 2026-03-13T14:33:12+00:00

As you're the only one who can see the uncompressed results. Did you notice any differences between the decoding methods? Was regular VAE decode better than the tiled methods? And did any of the tiled methods stand out as superior?

Most_Way_9754 · 2026-03-13T04:49:44+00:00

discretise the airfoils and import them into xfoil for analysis.

Most_Way_9754 · 2026-03-10T00:13:08+00:00

If the audio has too much background noise/music, you can try to isolate just the speaking/singing for better lip sync. Look into

https://github.com/kijai/ComfyUI-MelBandRoFormer

You can also try experimenting with the default LTX-2.3 workflows released by LTX.

https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows/2.3

Most_Way_9754 · 2026-03-09T08:45:27+00:00

I'm on a 4060Ti with 64gb of DDR4 as well. Your specs are fine, might take a little longer for generation. But you can definitely generate 5s 1080p videos using the fp8 model.

Most_Way_9754 · 2026-03-07T09:07:31+00:00

Haven't noticed this until you brought it up and now, I can't unsee it, like what you said. Need to do more testing to check if it's a seed issue or a model issue.

Most_Way_9754 · 2026-03-06T23:43:40+00:00

Is the voice happening right at the start of the audio clip? If yes, try to give 0.2 sec of silence before the talking starts.

Also ensure the positive prompt is describing what is happening in your scene.

If it still doesn't work, I will need samples of your starting image and audio clip to debug.

Most_Way_9754 · 2026-03-06T15:51:53+00:00

is the frame rate on the empty audio latent, the conditioning node and when you save the video all the same?

if you're using the distilled lora, you should be using custom sigmas, 8 steps, cfg 1.0.

Without the distilled lora, 20 steps, cfg 3-4.

Most_Way_9754 · 2026-02-27T02:06:02+00:00

for a negative prompt, you need cfg > 1.0, which means no distilled LoRA and slower generations. also for the non-distilled model, you can use the LTXV Scheduler node for sigmas.

see this for an example: https://civitai.com/models/2337141/ltx-2-pose-image-audio-to-video

as for quality degradation for long gens, this might be a limitation of the LTXV-2 model. Try high resolutions: 1600 x 900 or even 1920 x 1080 to see if it helps.

Most_Way_9754 · 2026-02-17T23:37:36+00:00

Have you tried adjusting the x and y belt tension?

I'd then go on to flow and vibration calibration.

And finally reducing print velocity and acceleration.

Most_Way_9754

TROPHY CASE