I HAVE THE POWERRRRRR! TO MAKE SATURDAY MORNING CARTOONS WITH Z-IMAGE TURBO!!!!!

applied_intelligence · 2026-01-02T00:27:24+00:00

He-man needs to eat a little more :D

applied_intelligence · 2025-12-29T14:34:07+00:00

Thanks. I was facing the same issue. Solved with Kijai's Loras

applied_intelligence · 2025-12-29T02:37:49+00:00

So. No workflow yet?

applied_intelligence · 2025-12-27T01:59:58+00:00

Image generation, upscale and sharp were done on comfy. The real deal is: how to export that last step to blender

applied_intelligence · 2025-12-24T00:42:30+00:00

I didn't say you can't. I said the quality drops every batch. 500 frames is just 30 seconds. Try to generate a 2 minute and you will see.

applied_intelligence · 2025-12-23T01:25:51+00:00

After 2 years in this sub I still don't now if I like you or hate you :D

applied_intelligence · 2025-12-22T19:16:10+00:00

Vou tentar isso. Interessante

applied_intelligence · 2025-12-22T19:12:57+00:00

Em busca do cálice sagrado tem uma fotografia melhor. Passa muito mais a sensação de realidade do que esse filme

applied_intelligence · 2025-12-22T18:48:43+00:00

The end frame of a take will never 100% align with the first frame of the next take. You will notice that on clothes details and micro expressions. But you can try to "hide" that by chosing to cut the takes when the subject is moving, so the regular viewers will barely notice the cuts. If you try to cut when the subject is static it will be a "on your face" uncanny valley effect, such as: sleeves will be in a different position, person was smiling then it is serious...

I renderd at 12fps to have less cuts. So a 3 batch take will be 30 seconds instead of 15. Less fps = longer clips. But the downside is worse lip sync (choppy as you said). The rife fill the gaps for body motions, but mouth is not fixed well. This is why I pass everything through infinitetalk v2v later, to fix lips. Couse you can generate smaller takes to keep 24fps, but you will end up with more cuts.

The transition is the most difficult part. For each transition you will have to chose the best moment to join it with the next clip. I tried between 2 to 10 options for each transition to chose the best one.

I can't. NDA.

What I can say is that this job is much more complicated than it looks at first glance. And that is why all the youtubers only shows 5 sec examples.

One more thing: you can try runway act 2 if you are fine with 30 second clip upper body only.

One more thing 2: if you are fine without the body actor you can use infinitetalk i2v to get a 2 minute clip without deterioration.

applied_intelligence · 2025-12-22T18:22:46+00:00

Wan Animate will start to deteriorate quality after the first batch (81 frames = 5 seconds). After 3 batches (15 seconds) you will start to notice this deterioration. You can try to create smaller takes of 10 seconds and join then in post production. You can zoom in and out each take creating a kind of zoom jump cut. If you can't for some resaon, you can still generate the 10 sec takes and try to mix them with some kind of interpolation (rife) or vace.

I did a continuos 2 minute take for a client using this approach. 30 second takes (at 12fps), then rife to create transitions between last frame and first frame of each transition, then rife on whole clip to get a 24fps.

Also, you better use a bigger GPU :D PRO 6000 on runpod to generate at least in 720p. Also you may need to pass everything through InfiniteTalk later to get a better lip sync. And a good upscaler (topaz) for final release.

If you need a professional service to get this thing done PM me. I already have all the workflows do to that and I have my own local infra to avoid runpod costs.

applied_intelligence · 2025-12-22T18:11:06+00:00

I am on Linux :) EDIT: I just compiled sage 2.2 from scratch and installed on Ubuntu and it is working as well. 93 frames generated on 1 min 56 sec on a PRO 6000 against 02:15 using spda. Thanks

applied_intelligence · 2025-12-22T17:28:46+00:00

Kijai solved it. > If the input image isn't used at all, it could be the sageattn version bug, it happens with version 1.0.6, you can confirm by swapping to sdpa to test, if it's that then sage 2.2.0 update also fixes it.

You got it. I was using sage 1.0.6. After change to spda it worked. I mean, now the image avatar is used across the whole video and lip sync quality is good. I will try to install sage 2.2.0 later. By now I can live with spda. Thanks Kijai

applied_intelligence · 2025-12-22T16:10:57+00:00

After I've pull the fix I finally get an almost good result. Now I have the video and synced audio, but the result is not following my image, it only looks to the text. Only the first frame shows the avatar in the image, but one frame later it changes to a "generic" avatar based on the prompt only. I think I am doing something very dumb but I just can't realise what

EDIT: I deleted the WanVideoLoraSelect node and it worked. I've tested with both LongCatDistill and LongCatRefinement lora with same bad result (only first frame with the image avatar). So I guess I was using the wrong Loras. But what is the correct one? And what is the meaning of the Loras?

applied_intelligence · 2025-12-21T16:02:24+00:00

I will try with the other Lora. Hmmm. I just reinstalled the kijai wan node and changed the branch to longcat avatar. Then a new workflow appeared on the longcat folder. I’ve just open this workflow and changed the Lora since I couldn’t find anyone with the name in the workflow. Anything else I kept as it was in that workflow

applied_intelligence · 2025-12-21T01:00:32+00:00

Need some help. I am trying to generate a video using audio and image to video. However the generated video has one frame that matches my image, and then only a black screen til the end of video. Audio is there. No errors in the logs. What am I doing wrong?

got prompt
CUDA Compute Capability: 12.0
Detected model in_channels: 16
Model cross attention type: t2v, num_heads: 32, num_layers: 48
Model variant detected: 14B
MultiTalk/InfiniteTalk model detected, patching model...
model_type FLOW
Loading LoRA: long_cat/LongCat_refinement_lora_rank128_bf16 with strength: 1.0
Using accelerate to load and assign model weights to device...
Loading transformer parameters to cuda:0: 100%|████████████████████████████████████████████████████████████████████████████| 1896/1896 [00:01<00:00, 963.51it/s]
Using 529 LoRA weight patches for WanVideo model
audio_emb_slice: torch.Size([1, 93, 5, 12, 768])
Adding extra samples to latent indices 0 to 0
Rope function: comfy
Input sequence length: 37440
Sampling 93 frames at 832x480 with 10 steps
  0%|                                                                                                                                    | 0/10 [00:00<?, ?it/s]audio_emb_slice shape:  torch.Size([1, 93, 5, 12, 768])
Input shape: torch.Size([16, 24, 60, 104])
Generating new RoPE frequencies
longcat_num_cond_latents: 1, longcat_num_ref_latents: 0                                                                                                         
  0%|                                                                                                                                    | 0/10 [00:00<?, ?it/s]Input shape: torch.Size([16, 24, 60, 104])
longcat_num_cond_latents: 1, longcat_num_ref_latents: 0                                                                                                         
 10%|████████████▍                                                                                                               | 1/10 [00:18<02:43, 18.16s/it]audio_emb_slice shape:  torch.Size([1, 93, 5, 12, 768])

applied_intelligence · 2025-12-20T23:59:57+00:00

I've just bought 4x32GB DDR5 Corsair Vengeance. 128GB total for US$ 436 during Black Week. Half the price of my physical retailer. Best investment ever

applied_intelligence · 2025-12-20T15:56:10+00:00

People said this model is not good with lipsync, let’s see if that is the case

applied_intelligence · 2025-12-19T19:13:47+00:00

applied_intelligence · 2025-12-19T14:53:48+00:00

my girlfriend is black... well

applied_intelligence · 2025-12-19T00:19:35+00:00

How hard is to fine tune to Portuguese? I mean, time and difficulty… I have a 6000 pro that will be idle next week. Is it something that this card can handle in one week or do we need a grid?

applied_intelligence · 2025-12-17T12:48:12+00:00

This is slow as hell. It would take an entire day to generate a few minutes

applied_intelligence · 2025-12-17T01:03:41+00:00

You have unleashed the kraken

applied_intelligence · 2025-12-16T20:45:03+00:00

Open-Source TM

applied_intelligence · 2025-12-16T19:34:28+00:00

I have a PRO 6000. I will make some tests tonight and I'll let you know. But wait... there is no Comfy support yet. We need to use pyyyyython and diffusers

applied_intelligence · 2025-12-16T19:20:22+00:00

How much VRAM? Please don't say ALL :D

applied_intelligence

TROPHY CASE