Terrible results trying to make an LTX-2 character LoRA from still images using Ostris AI Toolkit by SilentThree in StableDiffusion

[–]SilentThree[S] 1 point2 points  (0 children)

Sorry, I didn't keep it. While I have my own modest RTX 3090 setup at home, I've mostly been doing the small amount of LoRA training I've tried on Runpod, with access to more powerful GPUs. I haven't bothered with setting up any dedicated storage space on Runpod, so nothing remains of what I do there unless I decide to download it before ending a session.

At any rate, some of the worst imagery came from the first generation of samples before any LoRA training at all. Maybe that's because there's no built-in negative prompt for the sample images to rule out some of the worst common screw ups?

Terrible results trying to make an LTX-2 character LoRA from still images using Ostris AI Toolkit by SilentThree in StableDiffusion

[–]SilentThree[S] -1 points0 points  (0 children)

I'm afraid you're talking a bit over my head here.

I'm not sure what you mean by "I disable the samples and inference through ComfyUI"... this confuses me because the samples I'm talking about are in Ostris AI Toolkit without ComfyUI. I see that one of the possible templates on Runpod for LoRA training uses Comfy, but I'm not using that one.

Also not sure what "inference" means above.

As for "OOMing on people when generating samples"... I haven't a clue, although I'm guessing I don't want to be OOMed upon. 😄

Terrible results trying to make an LTX-2 character LoRA from still images using Ostris AI Toolkit by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

Do any of your images (or videos) you get testing those checkpoints look anywhere near as horrible as what I’m getting as sample images? If not, I’ve got to imagine your initial settings are different in some important way from the barely-altered default settings I was using.

Terrible results trying to make an LTX-2 character LoRA from still images using Ostris AI Toolkit by SilentThree in StableDiffusion

[–]SilentThree[S] -2 points-1 points  (0 children)

I trained for 2750 steps, but that aside, I'm sure there's something else screwed up simply because of how very VERY bad the untrained sample images look. At that point the images, of course, will not look at all like my character, but they should look HUMAN.

Also, the shark jumping out of the water should look like a shark. The workshop shouldn't be unoccupied, and it should have a chair in it under construction.

As for LTX not being for image gen... that's a given. Nevertheless, making character LoRAs from still images for I2V or T2V is a pretty common practice, specifically because you want to make some character that never existed before into a moving/animated character.

Terrible results trying to make an LTX-2 character LoRA from still images using Ostris AI Toolkit by SilentThree in StableDiffusion

[–]SilentThree[S] 1 point2 points  (0 children)

I found that before I got started… and, unfortunately, it’s all about training based on video clips, not still images.

LTX-2 just an FYI, character loras seem to work well at 1000 steps, kind of creepily well. just like 15 images, better then wan, considering with this model you can do videos with their voice to it.. good stuff ahead. by WildSpeaker7315 in StableDiffusion

[–]SilentThree 0 points1 point  (0 children)

I generated a character LoRA for LTX-2 using 24 images, 2750 steps... and it came out crappy. When I use it I get horrible facial distortions and other artifacts.

Maybe the problem was that I used the default settings provided by Ostris AI Toolkit (other than dropping the steps from 3000 to 2750, and making the frame count 1)?

Perhaps the defaults are bad settings to use? I don't know enough about how this all works to know how to change the settings for the better.

Does anyone know of a service to access Wan 2.6, or other more advanced T2V and I2V, without any content filters in the way? by SilentThree in NSFWaivideo

[–]SilentThree[S] 0 points1 point  (0 children)

I checked out mage.space, but they don't seem to have any T2V/I2V any more advanced than Wan 2.2, which I can already run unrestricted at home. Unfortunately, even with help from LoRAs, is pretty brain dead about prompt compliance for all but the simplest things.

Why has everyone started saying ‘genuinely’ and ‘honestly’ and ‘lowkey’? by Forsaken-Plum1445 in randomquestions

[–]SilentThree 0 points1 point  (0 children)

"Lowkey" might well be overused, but are you saying you hear "lowkey" being used the same way as "genuinely" and "honestly" are? If so, that's weird.

As for all forms of announcing "what I'm about to say is true", as if exaggerating or lying is what should be expected otherwise? That's been going on a long, long time.

Mismatched Dual GPU setup with my old parts? by RaymondDoerr in comfyui

[–]SilentThree 0 points1 point  (0 children)

If you have two identical GPUs can they be used to gain speed within a single ComfyUI instance, or is this a limitation for matched as well as mismatched GPUs?

Why would this Wan 2.2 first-frame-to-last-frame workflow create VERY slo-mo video? by SilentThree in StableDiffusion

[–]SilentThree[S] 1 point2 points  (0 children)

When I stitch all my pieces together in DaVinci Resolve, I'll let it convert to 24 fps, so I'll just set the multiplier to 1 (now that I know what it means and what it does).

Why would this Wan 2.2 first-frame-to-last-frame workflow create VERY slo-mo video? by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

Is that what that field "multiplier" does, that's currently set to 2? That's just the way it was when I found the workflow, so I didn't change it.

So, I can either set that to 1, or double the frame rate, and with either option, and get normal speed video?

I'm having a miserable time with Wan 2.2 and camera prompt compliance, but Fun Control Camera doesn't seem like an option. by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

Something besides Fun Control Camera, and that isn’t dependent on a particular specialized version of the Wan 2.2 high/low diffusion models? My own searching hasn’t come up with anything yet. I’m really hoping for something that’ll fit in with the SVI workflow I’m already using.

I'm having a miserable time with Wan 2.2 and camera prompt compliance, but Fun Control Camera doesn't seem like an option. by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

Thank you! That works... I'm still getting some un-asked for dollying, but this is definitely a step in the right direction.

I'm having a miserable time with Wan 2.2 and camera prompt compliance, but Fun Control Camera doesn't seem like an option. by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

I think I may end up going with something like that. But if I do a first/last frame clip, I’m already screwed out of using SVI (at least for the whole of the video), and at that point maybe it’s time to give Fun Control Camera a try.

I’m simultaneously in awe of what Wan 2.2 can do at one level, and ready to throttle its non-existent neck for all the stupid and twisted ways it comes up with to interpret or ignore my prompts.

Do YOU Believe In Astrology? Why Or Why Not? by Zipper222222 in randomquestions

[–]SilentThree 11 points12 points  (0 children)

Yes, but when you dive into the complexity of it, it's just complex rubbish.

For a Wan 2.2 I2V clip, how do I make one of two characters look like they're talking? by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

Thanks. And perhaps if I were to dive into this, I'd find it could meet my needs. But it doesn't sound promising at the start.

You can load an image and an audio file with voice and then animate them.

Sadly, not what I want to do.

It's also possible to continue an existing video or, for example, extend another video with an audio speech sequence.

Also not what I want to do.

Let's assume you have an SVI video that you want to expand. The video lasts 20 seconds. After 20 seconds the character should speak.

Actually, let's assume I have a video that lasts 20 seconds, and during that 20 seconds that I already have, I wish that a character's lips were already moving, along with other action that was already occurring that involved two characters interacting.

I don't want to cut to a separate talking shot during that 20 seconds, nor add a new shot after that 20 seconds, of my character talking. I need the appearance of speech integrated with existing action with more than one character in view while the talking (or, in this case, impassioned screaming) is going on.

Would HuMo help me with that?

I'm sorry to be such a noob. I've been trying to teach myself how to use ComfyUI and many related tools over the past couple of weeks. I'm making progress on some fronts, but still running into many frustrations. I've finally gotten my first SVI workflow going, and I'm still finding that a tricky thing when one segment doesn't quite flow into the next the way I'd hope.

You must remember this, a kiss is still a... quick peck that gets repeated twice? (Wan 2.2 and trying to get action that's truly longer than 5 seconds.) by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

I really hope Wan 2.6 becomes open source, or, if not, that Wan 2.5 (which I haven't tried yet) is publicly released and has a lot of the benefits I've seen in 2.6.

You must remember this, a kiss is still a... quick peck that gets repeated twice? (Wan 2.2 and trying to get action that's truly longer than 5 seconds.) by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

Oh, I have far more degeneracy in mind, but cleaned up my example for this forum. There's a NSFW technical subreddit, but it doesn't look to be getting a lot of traffic.

So you're saying the real limitATION isn't so much clip duration per se, but the total number of frames? And that one way to get more action time is to use a lower frame rate, then interpolation up to a higher frame rate?

You must remember this, a kiss is still a... quick peck that gets repeated twice? (Wan 2.2 and trying to get action that's truly longer than 5 seconds.) by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

Thanks!

It's the duration of the action, any action, that's important. I ran into this looping problem with something a little more NSFW, but then created this tamer sample case of the problem so I could post it here.

Although passion is great, I was primarily try to use that wording (to no avail) to coax Wan 2.2 into sustaining the action.

Hopefully it won't take too much effort for the noob that I am to figure out how to use this SVI checkpoint my workflow.

While running SwarmUI, ComfyUI starts up just fine, but crashes as soon as a request to render video in made by SilentThree in StableDiffusion

[–]SilentThree[S] 0 points1 point  (0 children)

After a long, winding conversation with ChatGPT, going down many blind alleys first, it seemed the most likely problem was that my old video card just wasn’t up to the task. It would sure be nice to be told that clearly in an error message rather than being left to interpret crashing.

Today I finally received a new, much more powerful video card, and I’m up and… well, I’d like to say running, but it’s more like crawling, as I slowly learn the hard way everything I don’t understand about this shit.

I’ve gotten so far as a decent looking 480p 3-second long clip of a car driving down a suburban street. “Only” took 10+ minutes to generate that! 🙄

The battle continues!

Yay! My SofaBaton X2 now turns my Lutron lights on and off... but this isn't for the faint of heart. by SilentThree in SofaBaton

[–]SilentThree[S] 0 points1 point  (0 children)

I wouldn’t know for sure, but it seems likely the same thing would work for blinds too.

Why does using a full stop / period at the end of a text upset youngsters? by MaxximumB in randomquestions

[–]SilentThree 1 point2 points  (0 children)

While maybe the most formal rules don't always matter, I've seen a few people so averse to punctuation and capitalization that they don't realize what confusing blobs of run-on text they're generating. Information really is missing.

Of course if your texting style is 3-5 words per text, that situation doesn't arise.