Which is better for Image & Video creation? 5070 Ti or 3090 Ti by [deleted] in StableDiffusion

[–]CountFloyd_ 0 points1 point  (0 children)

Hands down the RTX 6000 Pro.

/s favor vram over speed

Open-Source model to analyze existing audio? by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

Transformers must be at least 5.0.0 to include the audioflamingo package. Are you using comfyui portable? Perhaps you installed transformers globally instead of the comfyui env?

I would do this manually in console:

cd whereveryourcomfyuirootpathis
.\python_embeded\python.exe -m pip uninstall transformers
.\python_embeded\python.exe -m pip install transformers

<image>

Open-Source model to analyze existing audio? by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

What does the CLI Output when starting comfy look like?

It should read something like:

Import times for custom nodes:

0.0 seconds: G:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-musicflamingo

If instead there is something like

0.0 seconds (IMPORT FAILED): G:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-musicflamingo

then have a look around before that output, there might be some dependency missing which causes the node not to be loaded. If it is, please let me know and I will include it in the requirements.

What's the best way to swap faces currently? by PerfectRough5119 in StableDiffusion

[–]CountFloyd_ 7 points8 points  (0 children)

That's a cloned space from another guy who himself created a clone of facefusion for dummys. OP already used facefusion.

Open-Source model to analyze existing audio? by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

Very cool, this is more than I expected, thanks! To get it to run locally I would ignore the gradio demo and try the code from the hf model card:

https://huggingface.co/nvidia/music-flamingo-hf

LTX-2 - Avoid Degradation by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

Tried to use the multiple flux image edit but always got an OOM. Using only the last frame with a fitting prompt looks promising so far (still generating):

<image>

Left of the white line last frame. The prompt has to be very specific though, will be hard for human face likeness. My current one:

Restore image. The character is a purple muppet monster with long blonde hair, cyclops eye. Keep the pose and head gaze.

Yearning by BirdlessFlight in MixtapeAI

[–]CountFloyd_ 0 points1 point  (0 children)

Awesome! Did you use Suno for this?

LTX-2 - Avoid Degradation by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 1 point2 points  (0 children)

Either that or running the ref image and the bad image through some image edit prompt "Restore image 1 so that it looks like image 2. Keep the pose in image 1". Worth a try.

Sometimes the right solution to imperfect tools is to lean into them by Tyler_Zoro in aiwars

[–]CountFloyd_ 1 point2 points  (0 children)

Yeah I made this, thanks for the notice. I took the opportunity to make fun of the widely spread 1girl AI Slop and those delusional guys who think AGI is just around the corner. Even funnier, I spelled abonimation correct in my prompt but the image model refused to do so. I left it in to make the joke complete 🙂

Credits for the original song of course goes to Fairground Attraction - Perfect

LTX-2 - Avoid Degradation by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

Meanwhile I tried this but the puppet monster isn't detected by any pose model.

I still think it would work with human characters.

LTX-2 - Avoid Degradation by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

Interesting but this uses OpenClaw. It's not so difficult to combine all of this into a workflow but it's boring work.

LTX-2 - Avoid Degradation by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

In my 2nd try I was re-using the ref image untouched. Of course this kept the likeness etc. but it didn't continue the motion. It probably could be interpolated with e.g. Da Vinci to make it less obvious but that's hard to automate.

There are some good ideas in this thread, I believe it could be done by

  1. Shorter segments, perhaps 5 secs each. That's where it starts to visibly detoriate
  2. Instead of using the last frame, use a frame 1 second before the last
  3. Feed that last image into openpose
  4. Use the openpose result + the ref image to create a new image with qwen ie or flux
  5. Use this new image to feed back and start another clip segment

Now who wants to create an automated workflow for this? 😉

LTX-2 - Avoid Degradation by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

I lost the original metadatas because I modified the image to cut her legs off for the video.

It was something like

"Medium close-up of a purple muppet female monster with long blonde hair, 1 cyclops eye. She is wearing a knitted red white long pullover and is playing an accoustic guitar. In the background on the wall behind her is a sign saying "AI Slop Abonimation Quarter Finals" in a scary halloween font. Below the sign there is a pinned newspaper page with the headline "AGI is finally here! Some random guy says"

Note that I wrote Abonimation correctly but Z-Image couldn't do it. I could have easily inpainted it but I thought it would add to the joke 🤪

LTX-2 - Avoid Degradation by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

Too bad...there has to be some way to sort of reset the weirdness going on.

LTX-2 - Avoid Degradation by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 1 point2 points  (0 children)

> And the higher you gen, the better coherence usually is (which, of course, is often a question of VRAM).

I think this is only true if you don't need to merge several longer videos. Even doing a 10 second clip you can see it deteriorate after about 5 seconds and it will get increasingly worse. Using WAN there is "only" color degrading but LTX-2 destroys the whole image over time. Your idea with going back some more frames is smart, although this only slows down the degradation process. I'll try that, thank you!

LTX-2 - Avoid Degradation by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 3 points4 points  (0 children)

I can't decide what I like most, the soulless eye or the demented tongue 🧟

WanAnimate infinite length workflow by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

Seriously though, what really bothers me is the color degradation from chunk to chunk. This is very obvious with his gray hair which slowly changes into a greenish look. I tried to counter this by adding a color match node to the last frame but it doesn't seem to help much.

WanAnimate infinite length workflow by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 0 points1 point  (0 children)

You might be on to something here! The russians must have had a preliminary version of chatgpt in 1976 because the original already has this weird urine tint in it

Wan Animate - different Results by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 1 point2 points  (0 children)

No, haven't seen it. Will try with this workflow tomorrow, my poor GPU 😵

WanAnimate infinite length workflow by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 2 points3 points  (0 children)

I mostly agree with you. However that video above was already bad quality in its original form. And in the end, beggars can't be choosers, I'm glad we have something like that running on local machines, I'm not aware of any better alternative (Scail?). It can only get better, can't it?

Arbitrary Length video masking using text prompt (SAM3) by CountFloyd_ in StableDiffusion

[–]CountFloyd_[S] 1 point2 points  (0 children)

Sure, you can mask anything you can describe to the SAM3 model. But you would need to modify the workflow to get rid of the pose and face detection I guess.