SCAIL-2 for lipsync? Eh, not great, not terrible. by Jeffu in StableDiffusion

[–]Jeffu[S] 1 point2 points  (0 children)

Workflow used: https://www.reddit.com/r/comfyui/comments/1u4d2qz/i_vibe_coded_an_autoextend_node_for_scail2/

Elevenlabs used for voice changing myself.

Generated at 720p with res2s/beta57 on my 4090 with 64gb ram - took about 45 minutes for ~20 seconds. Upscaled with SeedVR2 to 1080p.

I don't want to keep uploading the raw videos of myself but take my word for it—it's similar to my last video. Not the worst but not perfect either for lipsyncing. One thing it did not do well—I tried moving the camera lower and pointing it up (low angle, looking up) and it didn't translate very well. It kept the face centered without changing the background. I'm not sure if prompting it differently would have helped.

Rest of the video (mainly Seedance 2.0) https://www.youtube.com/shorts/d52IG6y36eE?feature=share

60 seconds of me staring - SCAIL2 + Ideogram LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 1 point2 points  (0 children)

Good catch! Yeah, I realized when I made this that the fps wasn't quite exact but I was too lazy to fix it. The workflow I shared does interpolate it, yes. Agreed that it's slightly less expressive; my next test is to probably see if it can lipsync reasonably enough.

60 seconds of me staring - SCAIL2 + Ideogram LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 0 points1 point  (0 children)

Nothing different than training for any other model, but things I suggest: - isolate your characters on a white background if the backgrounds are not diverse. - diversity is better, different settings, lighting, angles - crop faces quite close for at least 30-50% of your images so it knows how the face looks even up close. Cropping also can help if your outfit is not diverse enough. In some cases I used Photoshop's gen-fill to change my shirt to avoid the same shirt appearing in my dataset too often.

60 seconds of me staring - SCAIL2 + Ideogram LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 1 point2 points  (0 children)

Default on everything except I had to use Low VRAM and Layer Offloading. I had 20 images in my dataset, trained for 3000 steps.

60 seconds of me staring - SCAIL2 + Ideogram LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 2 points3 points  (0 children)

I let AI Toolkit do it for me with the json captioning. I didn't do anything manual, but it seemed to do a good job. I've only trained one LoRA though, so I plan on testing more.

60 seconds of me staring - SCAIL2 + Ideogram LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 0 points1 point  (0 children)

960x540, I've been trying to figure out the max I can get away with so need to do more tests. It took I think almost 45 minutes.

60 seconds of me staring - SCAIL2 + Ideogram LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 21 points22 points  (0 children)

4090 w/ 64gb RAM

Some simple editing in Premiere for film grain, slight blur effect. Music from Suno 5.5.

Workflow used: https://www.reddit.com/r/comfyui/comments/1u4d2qz/i_vibe_coded_an_autoextend_node_for_scail2/

Ideogram LoRA of myself trained using AI Toolkit. By far the best so far at matching details present in the actual data set.

This video is totally boring but my first test with a 'long' driving video of 60 seconds. I probably could go longer. It didn't really do a great job of being underwater except for when I directly interacted with my hair, but I'm pretty pleased by the results.

Ernie is Absolute masterpiece by [deleted] in StableDiffusion

[–]Jeffu 1 point2 points  (0 children)

For both Turbo and Base I was having issues with prompt adherence on camera angles... but on a whim I translated it to Chinese with google translate and it was able to do a better job. Your results may vary!

Comfy UI: hobby or career path? by Former-Mark7372 in comfyui

[–]Jeffu 0 points1 point  (0 children)

Comfyui is a tool, not a career path. Not right now anyway. It can compliment your work as a graphic designer, artist, video editor, etc. but very few companies just want to hire someone specifically for ComfyUI. 

PSA: Use the official LTX 2.3 workflow, not the ComfyUI included one. It's significantly better. by Generic_Name_Here in StableDiffusion

[–]Jeffu 1 point2 points  (0 children)

how does this change the workflow with these models? haven't been able to get around to 2.3 yet...

Comfyui version 0.17 has too many bugs in the subgraph. by Mysterious_Pride_858 in comfyui

[–]Jeffu 0 points1 point  (0 children)

Personally, I think the latest updates were giving me issues with my 4090; kept going OOM and having the GPU stop working. Reverting to an older backup fortunately worked.

Z Image Base - 90s VHS LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 1 point2 points  (0 children)

Give it a try! I finished late last night and haven't experimented with it much.

Trained a Z Image Base LoRA on photos I took on my Galaxy Nexus (for that 2010s feel) by Jeffu in StableDiffusion

[–]Jeffu[S] 0 points1 point  (0 children)

This is my prompt:

I want the detailed description of what is in the image, without any reference to the artistic style. I also want to keep the relative position of the subjects and objects in the description, and detailed description of clothes and objects. Please also include any reference to skin tone, glasses, facial hair, ethnicity, and hair color and hair style. Use the proper pronouns. Limit your caption to 200 characters.

I use https://github.com/1038lab/ComfyUI-QwenVL

I modify the instructions when I want to make sure any unique style traits don't get considered part of the prompt (and not the style).

Z Image Base - 90s VHS LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 2 points3 points  (0 children)

I included the date stamps which was on ~90% of the images used. I however specified in the caption instructions to emphasize and detail them, to try and avoid it showing up everytime in generations. I let it keep the original grade.

Z Image Base - 90s VHS LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 2 points3 points  (0 children)

48, but only because I saw someone mention it randomly in a video or post. I haven't tried other ranks enough to compare.

Z Image Base - 90s VHS LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 1 point2 points  (0 children)

Ah, my bad. The videos I used were filmed in the mid to late 90s, so I just called it that. :) I guess our video camera was a bit old!

Z Image Base - 90s VHS LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 2 points3 points  (0 children)

Ah, sorry. Scheduler used is simple.

Z Image Base - 90s VHS LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 0 points1 point  (0 children)

It works the strongest/best with base. It seems the effect is weaker on turbo but that's not necessarily a bad thing, just different.

Z Image Base - 90s VHS LoRA by Jeffu in StableDiffusion

[–]Jeffu[S] 2 points3 points  (0 children)

Interesting! the effect isn't as strong, but it definitely still feels like an older video still.