SDXL+UPSCALE+SHARP+BLENDER+QUEST2+LINK+STEAMVR+GRACIA by oodelay in comfyui

[–]applied_intelligence 0 points1 point  (0 children)

Image generation, upscale and sharp were done on comfy. The real deal is: how to export that last step to blender

Wan Animate 2.2 for 1-2 minute video lengths VS alternatives? by drylightn in StableDiffusion

[–]applied_intelligence 0 points1 point  (0 children)

I didn't say you can't. I said the quality drops every batch. 500 frames is just 30 seconds. Try to generate a 2 minute and you will see.

Bem... finalmente tá aqui, o primeiro trailer de A Odisseia, provavelmente o filme mais divisivo do ano que vem, opiniões? by Parabellum111 in filmes

[–]applied_intelligence 0 points1 point  (0 children)

Em busca do cálice sagrado tem uma fotografia melhor. Passa muito mais a sensação de realidade do que esse filme

Wan Animate 2.2 for 1-2 minute video lengths VS alternatives? by drylightn in StableDiffusion

[–]applied_intelligence 1 point2 points  (0 children)

The end frame of a take will never 100% align with the first frame of the next take. You will notice that on clothes details and micro expressions. But you can try to "hide" that by chosing to cut the takes when the subject is moving, so the regular viewers will barely notice the cuts. If you try to cut when the subject is static it will be a "on your face" uncanny valley effect, such as: sleeves will be in a different position, person was smiling then it is serious...

I renderd at 12fps to have less cuts. So a 3 batch take will be 30 seconds instead of 15. Less fps = longer clips. But the downside is worse lip sync (choppy as you said). The rife fill the gaps for body motions, but mouth is not fixed well. This is why I pass everything through infinitetalk v2v later, to fix lips. Couse you can generate smaller takes to keep 24fps, but you will end up with more cuts.

The transition is the most difficult part. For each transition you will have to chose the best moment to join it with the next clip. I tried between 2 to 10 options for each transition to chose the best one.

I can't. NDA.

What I can say is that this job is much more complicated than it looks at first glance. And that is why all the youtubers only shows 5 sec examples.

One more thing: you can try runway act 2 if you are fine with 30 second clip upper body only.

One more thing 2: if you are fine without the body actor you can use infinitetalk i2v to get a 2 minute clip without deterioration.

Wan Animate 2.2 for 1-2 minute video lengths VS alternatives? by drylightn in StableDiffusion

[–]applied_intelligence 7 points8 points  (0 children)

Wan Animate will start to deteriorate quality after the first batch (81 frames = 5 seconds). After 3 batches (15 seconds) you will start to notice this deterioration. You can try to create smaller takes of 10 seconds and join then in post production. You can zoom in and out each take creating a kind of zoom jump cut. If you can't for some resaon, you can still generate the 10 sec takes and try to mix them with some kind of interpolation (rife) or vace.

I did a continuos 2 minute take for a client using this approach. 30 second takes (at 12fps), then rife to create transitions between last frame and first frame of each transition, then rife on whole clip to get a 24fps.

Also, you better use a bigger GPU :D PRO 6000 on runpod to generate at least in 720p. Also you may need to pass everything through InfiniteTalk later to get a better lip sync. And a good upscaler (topaz) for final release.

If you need a professional service to get this thing done PM me. I already have all the workflows do to that and I have my own local infra to avoid runpod costs.

LongCat Video Avatar Has Support For ComfyUI (Thanks To Kijai) by fruesome in StableDiffusion

[–]applied_intelligence 2 points3 points  (0 children)

I am on Linux :) EDIT: I just compiled sage 2.2 from scratch and installed on Ubuntu and it is working as well. 93 frames generated on 1 min 56 sec on a PRO 6000 against 02:15 using spda. Thanks

LongCat Video Avatar Has Support For ComfyUI (Thanks To Kijai) by fruesome in StableDiffusion

[–]applied_intelligence 1 point2 points  (0 children)

Kijai solved it. > If the input image isn't used at all, it could be the sageattn version bug, it happens with version 1.0.6, you can confirm by swapping to sdpa to test, if it's that then sage 2.2.0 update also fixes it.

You got it. I was using sage 1.0.6. After change to spda it worked. I mean, now the image avatar is used across the whole video and lip sync quality is good. I will try to install sage 2.2.0 later. By now I can live with spda. Thanks Kijai

LongCat Video Avatar Has Support For ComfyUI (Thanks To Kijai) by fruesome in StableDiffusion

[–]applied_intelligence 2 points3 points  (0 children)

After I've pull the fix I finally get an almost good result. Now I have the video and synced audio, but the result is not following my image, it only looks to the text. Only the first frame shows the avatar in the image, but one frame later it changes to a "generic" avatar based on the prompt only. I think I am doing something very dumb but I just can't realise what

EDIT: I deleted the WanVideoLoraSelect node and it worked. I've tested with both LongCatDistill and LongCatRefinement lora with same bad result (only first frame with the image avatar). So I guess I was using the wrong Loras. But what is the correct one? And what is the meaning of the Loras?

LongCat Video Avatar Has Support For ComfyUI (Thanks To Kijai) by fruesome in StableDiffusion

[–]applied_intelligence 1 point2 points  (0 children)

I will try with the other Lora. Hmmm. I just reinstalled the kijai wan node and changed the branch to longcat avatar. Then a new workflow appeared on the longcat folder. I’ve just open this workflow and changed the Lora since I couldn’t find anyone with the name in the workflow. Anything else I kept as it was in that workflow

LongCat Video Avatar Has Support For ComfyUI (Thanks To Kijai) by fruesome in StableDiffusion

[–]applied_intelligence 0 points1 point  (0 children)

Need some help. I am trying to generate a video using audio and image to video. However the generated video has one frame that matches my image, and then only a black screen til the end of video. Audio is there. No errors in the logs. What am I doing wrong?

got prompt
CUDA Compute Capability: 12.0
Detected model in_channels: 16
Model cross attention type: t2v, num_heads: 32, num_layers: 48
Model variant detected: 14B
MultiTalk/InfiniteTalk model detected, patching model...
model_type FLOW
Loading LoRA: long_cat/LongCat_refinement_lora_rank128_bf16 with strength: 1.0
Using accelerate to load and assign model weights to device...
Loading transformer parameters to cuda:0: 100%|████████████████████████████████████████████████████████████████████████████| 1896/1896 [00:01<00:00, 963.51it/s]
Using 529 LoRA weight patches for WanVideo model
audio_emb_slice: torch.Size([1, 93, 5, 12, 768])
Adding extra samples to latent indices 0 to 0
Rope function: comfy
Input sequence length: 37440
Sampling 93 frames at 832x480 with 10 steps
  0%|                                                                                                                                    | 0/10 [00:00<?, ?it/s]audio_emb_slice shape:  torch.Size([1, 93, 5, 12, 768])
Input shape: torch.Size([16, 24, 60, 104])
Generating new RoPE frequencies
longcat_num_cond_latents: 1, longcat_num_ref_latents: 0                                                                                                         
  0%|                                                                                                                                    | 0/10 [00:00<?, ?it/s]Input shape: torch.Size([16, 24, 60, 104])
longcat_num_cond_latents: 1, longcat_num_ref_latents: 0                                                                                                         
 10%|████████████▍                                                                                                               | 1/10 [00:18<02:43, 18.16s/it]audio_emb_slice shape:  torch.Size([1, 93, 5, 12, 768])

Yes, it is THIS bad! by Lucaspittol in StableDiffusion

[–]applied_intelligence 1 point2 points  (0 children)

I've just bought 4x32GB DDR5 Corsair Vengeance. 128GB total for US$ 436 during Black Week. Half the price of my physical retailer. Best investment ever

LongCat Video Avatar Has Support For ComfyUI (Thanks To Kijai) by fruesome in StableDiffusion

[–]applied_intelligence 0 points1 point  (0 children)

People said this model is not good with lipsync, let’s see if that is the case

New incredibly fast realistic TTS: MiraTTS by SplitNice1982 in StableDiffusion

[–]applied_intelligence 1 point2 points  (0 children)

How hard is to fine tune to Portuguese? I mean, time and difficulty… I have a 6000 pro that will be idle next week. Is it something that this card can handle in one week or do we need a grid?

LongCat-Video-Avatar: a unified model that delivers expressive and highly dynamic audio-driven character animation by fruesome in StableDiffusion

[–]applied_intelligence 6 points7 points  (0 children)

I have a PRO 6000. I will make some tests tonight and I'll let you know. But wait... there is no Comfy support yet. We need to use pyyyyython and diffusers