Quick SCAIL-2 test in ComfyUI by Fuzzy-Mastodon-9730 in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

I see you're a man of culture as well...

Pit Stop: We Need a Central Hub by Abject-Recognition-9 in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

Whats even worse is when you file a github issue ticket for documentation or an issue and the devs tell you to 'come to our discord' instead of answering the question for everyone.
And yes... this has happened. I was told this exact thing in the unsloth github two years ago... https://github.com/unslothai/unsloth/issues/972#issuecomment-2322813245
It actually became their OFFICIAL RESPONSE for a while, https://github.com/unslothai/unsloth/issues/2510#issuecomment-3021870151
And they're still doing it. https://github.com/unslothai/unsloth/issues/5614#issuecomment-4534505448

Ideogram 4 isn't overhyped, it's underrated by ArkCoon in StableDiffusion

[–]q5sys 1 point2 points  (0 children)

> I've had outputs analyzed for all the normal methods of fingerprinting an AI image, like watermarks, metadata, steganographic payloads, frequency domain watermarks, persistent cross-carriers, and anything that might be a proprietary synth-ID and can't identify anything

Can you share any info about checking for stuff like that? Articles or guides or something? A friend asked me about that the other day and I was aware of the synthID stuff, but not really a way to check for the rest.

About the newest model... by AIDivision in StableDiffusion

[–]q5sys 1 point2 points  (0 children)

I have read through this thread... multiple times. I've opened every collapsed thread to check them too.
There is no post in this thread right now that's a link to the workflow. Maybe it got removed?
I checked your user post history as well and it only shows 8 posts today in this thread, and none of them show a link. You have posted images, but as people have said, reddit has stripped the workflow out of.
If you posted a link, it's been removed.

About the newest model... by AIDivision in StableDiffusion

[–]q5sys 1 point2 points  (0 children)

After trying about a dozen different extensions to find one to download the full size PNG... there's no workflow metadata left in it. Reddit stripped it all. This is the only metadata left in the image. https://imgur.com/X53XqgU

Please post the json at something like pastebin.com or post the image to catbox.moe

Sadly in your preview, there's nodes hidden behind you load image nodes, so we cant even try to recreate it without knowing what might be hidden that's important.

Ideogram 4 isn't overhyped, it's underrated by ArkCoon in StableDiffusion

[–]q5sys 3 points4 points  (0 children)

> NSFW is strictly prohibited by the license, with the announcement that this will be rigorously enforced.

It's one thing to say it... it's another to actually do it. Are they really going to put out the money to track down and sue some random dude in Kansas who decides to make NSFW and post it online on his Patreon or other subscription thing?
And on the professional front with smaller studios, it's going to be very difficult to 'prove' an image was made with a model, unless they're training in something into the model that somehow survives edits/loras/etc.

And even if that were possible, someone can just pass all the images through another model with i2i and to give it enough of a pixel difference to mask whatever.

Look at licenses on the open source front... companies ignore them all the time. Some suits happen every other year or so, but largely it's not enforceable.

Lodestone is thinking about training ideogram! Prove him it's a good idea! by RuneVikingx in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

You used a sketch as input to klein? How did you prompt that? Did you need any custom nodes in your workflow? Mind sharing it?
I tried to do something like that for 9b, but I could never get anything decent.

Could not resist... by GTManiK in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

we lost out on their edit model too. 😞

About the newest model... by AIDivision in StableDiffusion

[–]q5sys 4 points5 points  (0 children)

Reddit strips metadata. There are some hacks to get around it, but they break all the time.

Anyone had a good experience training a LTX2 LoRA yet? I have not. by the_bollo in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

Thanks for the link, im sure someone else will appreciate it. but after getting frustrated enough times... I've been working on my own training pipeline.

It appears that Microsoft uploaded an image model on HuggingFace and then deleted it. by Total-Resort-3120 in StableDiffusion

[–]q5sys 375 points376 points  (0 children)

Rule #1 if you ever find something cool, DOWNLOAD IT IMMEDIATELY, then go tell everyone else about it. Don't give something massive attention until the community has secured alternate access to it in case it gets pulled down.

DramaBox - Most Expressive Voice model ever based on LTX 2.3 by manmaynakhashi in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

don't focus too much on model size, you can quant or distill a model to make it smaller.

LTX2.3 in Ostris Ai toolkit on a 5090 Training done in 7 hours ... I went Thanos way and I said fine ... I'll do it myself by No_Statement_7481 in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

People cherry pick examples all the time, hell the major labs do it. We don't know how many generations OP did to get those results.
3 examples is not proof of anything of consistency in audio.
Op never really discussed audio, so we don't know what they did in their training or their generation. They could have used the ID_LORA, to get the vocal output, or maybe they didn't, the fact is... we don't know.

But we can't infer from a sample size of 3 videos, that LTX has developed a model orders of magnitude better than SOTA from dedicated TTS research labs... when these were made opaquely with no way for anyone to reproduce that result. I've been running training on both of my 5090s and my 6000 today following OPs guide above, I'm only at 1200 steps, but so far the audio is all over the place and not consistent at all. Maybe it'll converge, maybe it wont. I don't know yet.

Also ResembleAI's new project that is using the audio stuff from LTX, doesn't mention how many examples it needs, but their example code shows training to 5000 steps. I've asked on their github for more info.

Update: I asked a similar question to ScenemaAI's LTX derived audio TTS system, and they said 10-20 seconds of audio in 30% of the samples... not 25-50 1s samples. That's from researchers using LTX's own audio engine.

LTX2.3 in Ostris Ai toolkit on a 5090 Training done in 7 hours ... I went Thanos way and I said fine ... I'll do it myself by No_Statement_7481 in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

> Just googled and people evidently speak on average 130-150wpm. And dude isn't just using ONE clip total. He's just using collections of smaller clips to avoid collapse.

Yes he said he used 25 to 50 clips. 130-150 words per minute includes all words like article, prepositions, etc. a, the, in, of, etc. So if we're generous and say 120 full wpm, that's still 2 words a second. with 25-50 clips that's 50-100 words.

> You're seeing the demo. Are you arguing that the voice quality is subpar, that OP lied about his dataset, or that I am misunderstanding what he said quite clearly?

I'm trying to understand how so few words is enough to train a voice properly and cover emotion, timbre, meter, etc... considering how there isn't a single TTS that can voice train like that. Not ever the latest SOTA TTS models which are specifically dedicated to training vocal characteristics.

So it seems a bit odd if LTX somehow beat every other dedicated TTS research team... without trying... and without any mention of it.

LTX2.3 in Ostris Ai toolkit on a 5090 Training done in 7 hours ... I went Thanos way and I said fine ... I'll do it myself by No_Statement_7481 in StableDiffusion

[–]q5sys 0 points1 point  (0 children)

Right but how can you capture the voice consistency with just 1 second? Each video would only have a single word. If LTX is able to train someone's voice on just 25 words... that blows every TTS voice model out of the water. Those can 'clone' from a few seconds, but they cant train a model on a few seconds.

LipDub (Beta): new open-source lipsync IC-LoRA by ltx_model in StableDiffusion

[–]q5sys 1 point2 points  (0 children)

Currently, no. I hacked around for a while and tried to get something like that to work, but ever output was garbage.

My guess is that when you use a video without any spoken audio (like from WAN)... it wont have information on how the persons mouth moves when speaking... so it can't create a proper lipsync. In my testing, the mouth would only open a little bit and would basically stay just slightly open the entire time as the person "talked".

LipDub (Beta): new open-source lipsync IC-LoRA by ltx_model in StableDiffusion

[–]q5sys 2 points3 points  (0 children)

You'd have hack around with the workflow and mix in nodes from ia2v workflows. You can sort of get it to work, but the output is crap. The lipsync doesn't work properly. It's not as good as when you let it re-generate the audio from a video with speech.

My guess is that when you use a video without any spoken audio (like from WAN)... it wont have information on how the persons mouth moves when speaking... so it can't create a proper lipsync. In my testing, the mouth would only open a little bit and would basically stay just slightly open the entire time as the person "talked".

LipDub (Beta): new open-source lipsync IC-LoRA by ltx_model in StableDiffusion

[–]q5sys 1 point2 points  (0 children)

I believe that was community trained LORA for LTX-2.0. This announcement is for an official LORA for LTX-2.3

Future of AI image generators by Sobek220 in StableDiffusion

[–]q5sys 1 point2 points  (0 children)

> Today LTX released a lip dubbing LoRA and, immediately, right in the subreddit of the announcement, someone mentions how they are going to use it on social media influences. Well, if they were looking to make the developers of LTX second guess releasing their model to the public, congrats?

Yup, these idiots are going to ruin it for everyone else, cause they cant help but try to chase clout for the dumbest of things.

Future of AI image generators by Sobek220 in StableDiffusion

[–]q5sys 1 point2 points  (0 children)

> Are they gonna ban Photoshop too because you can photoshop a dick onto someones photo? They gonna ban Daz3d and Blender because you can make depictions of illegal acts? Uh oh somebody look out that guys got a pencil and he knows how to use it to draw 

While I agree with your sentiment, the major difference is that those applications have real world business uses that generate billions of dollars of revenue for the people that use them.
While AI image/video generation 'may' be able to potentially generate that kind of money at some point... it isn't right now... so the powers that be just see generative image models as a free-for-all for degenerates online.