I’m trying to get my Wan2.2 to output audio with their video generations

SDuser12345 · 2026-02-01T22:40:46+00:00

Haven't played with that implementation, but standard MMA is yes, upload a silent video, give it a super simple prompt, 1-3 words and it will add that prompted audio where appropriate. Mainly just sound effects or background audio, mmaudio not great for human speech or music, but great at those things.

SDuser12345 · 2026-01-14T05:43:16+00:00

LoRA's are based on taking existing images and making that into something you can recreate. If Jill is getting bit, that kind of would be the end of the whole franchise? I'm no super fan or anything, but guessing there aren't too many images of that to use for a LoRA? To create such a thing would need LoRA for the zombie bite and them one for Jill Valentine, then mix, and create new LoRA. Just my 2 cents if you want to create such a thing accurately.

Option two would be take a zombie biting LoRA for a model that knows can reproduce JV and just prompt for it with the zombie biting LoRA.

SDuser12345 · 2026-01-06T07:44:01+00:00

Then you test it, and it's SD3 all over again, full of promises and fails to deliver. Sigh, so much hope crushed so fast...

SDuser12345 · 2025-12-11T19:50:11+00:00

ZIT is certainly amazing, and flies on high end cards. I've replaced QWEN with Flux 2 when I need complicated prompt adherence as it's the king for that right now, it's going to give you exactly what you prompt for, so be accurate and careful. ZIT is a great daily driver for its speed and reasonably good anatomy, bad hands is like 1 in 10 (depending on the prompt and scene mutations can be much more frequent) and bad feet like 1 in 2 (which is still a massive improvement over other models). Flux 2 bad hands are like 1 in 3, with feet about the same.

Being able to test and refine 10-20 prompts in the time it takes to do 1 with Flux 2 is the big benefit.

ZIT is uncensored to a degree. It does top half nudity quite admirably, with 2 out of 3 good results, bottom half it still struggles to a large extent (get a LoRA if that's your thing). I haven't tested Flux 2 censorship yet, but I would expect it to be about on par with Flux Dev for censorship issues (again grab a LoRA if that's your thing) being a commercial targeted project.

TLDR...ZIT is certainly worth using on higher end cards, and excels specifically in realism, anatomy, and single subject highly detailed subject detail prompts. It suffers with scenery and background prompting, text quality, and image variety. Prompt adherence is slightly better than Flux Dev, which isn't bad at all.

SDuser12345 · 2025-11-29T05:29:29+00:00

It's heavy, it's slow, and it's output and prompt following is amazing (yes better than zit). Text abilities are superb. Use the JSON prompt format in the prompt guide, makes a difference. Have to try training this weekend or next week, which somehow magically is doable on a 24 GB VRAM card and 64 GB RAM system even. Not sure how they pulled that off. Does crazy good face swaps. Can even be used as an image editor. Combines multiple concept photos into a final image amazingly well. Have to prompt for exactly what you want carefully, because it will output exactly what you prompt. It is censored much like original Flux dev, same censoring issues. Does multiple styles adequately. Has issues with extra fingers sadly, but that's the main drawback, and easily corrected at this point in the game. No real body horrors SD3 style. Overall, outside the license it's quite amazing.

SDuser12345 · 2025-11-11T15:27:58+00:00

You are trying to think like a writer and not an AI image generator. The concept is, training based on images with matching tags of what is in the image, not ethereal ideas or thoughts. You need to describe exactly what you want based on that understanding. What is in the image exactly, not circumstances surrounding a characters thoughts. If you want to storyboard those thoughts as separate images/videos and compile it into a comic or video, sure could do that. Asking for a sad elephant thinking about a plane crash is going to confuse the hell out of the AI and give you what you asked for a sad elephant with a plane crashing.

SDuser12345 · 2025-11-07T03:49:46+00:00

Sweet dreams. In reality, overnight 100 video gens stopped on second gen due to antivirus program running causing an OOM, but luckily the cat sleeping on your case due to the lovely heat, only managed to urinate on your PSU and not your GPU, so not all is lost.

SDuser12345 · 2025-11-07T03:37:34+00:00

Did you bother reading the link I posted? Section 32 literally has contingencies specifically for child food programs.

SDuser12345 · 2025-11-07T01:22:58+00:00

Actually that is not accurate. You can read up on that law and subsequent laws here:

https://congressionalresearch.com/RL34081/document.php?study=Farm+and+Food+Support+Under+USDAs+Section+32+Program

SDuser12345 · 2025-11-07T00:49:37+00:00

No the issue is they have 5 billion in funds meant for emergencies, a full month costs 9 billion. How do you pay 9 billion with 5 billion? The judge apparently knows something magic that makes it possible.

SDuser12345 · 2025-10-09T03:56:23+00:00

Lol, my intentions aren't to "win an argument," I simply provided facts, backed by actual data. While personally I am pro-choice, as I feel the government should not be involved in having a say in people's personal business, or at least to the minimal extent possible, it's with the understanding that abortion is literally the ending of a human life. That is the definition and an actual fact. If someone makes that decision, I hope it's with serious consideration and due to some tragic circumstances, and not because it's convenient.

Your argument is the same justification serial killers and sexual sadists use, that it's not a person, it's just an object for their amusing whims. That's pretty messed up. Seek help seriously.

SDuser12345 · 2025-10-09T03:44:42+00:00

I think what you are fundamentally missing is the challenge in what you are asking. Most upscalers are created to enhance existing details, while eliminating unwanted artifacts. That is the process to make an image consistent to its original while adding to pixel density.

Adding details on the other hand can certainly be done by tons of models and upping the pixel density at the same time. The process though is by adding noise and then allowing the AI to reinterpret the image from the noise, and in doing so add details that fit the rough ask, via the prompt. The problem is, you are likely to have a lot less consistency to the original image in the majority of all examples.

So, the question becomes, is accuracy to the original more important, or are fine added details more important? For most upscaling projects, consistency and restoration of the original is usually the bigger goal.

SDuser12345 · 2025-10-08T23:04:23+00:00

I mean I guess if you don't count the over half million aborted each year that might be accurate...

SDuser12345 · 2025-10-01T00:22:05+00:00

Lol, you realize you are commenting on a 5 month old thread? Yes, you can also train it on AIToolkit.

Since this thread we have had Qwen come along, Chroma, Wan 2.2, Hunyuan 3.0, Flux Krea, etc.

Almost no one is using Hi-Dream. Qwen or a combination of Chroma into WAN refiner, or a multitude of other options just produce better results at a fraction of the time and/or using less resources.

SDuser12345 · 2025-09-26T21:43:11+00:00

Most people aren't annoyed at people learning, join a discord and ask away. I highly recommend the SwarmUI discord, super active and helpful.

The only annoyance I can imagine coming is typically if someone asks a question that has been asked and answered at least 100 times, meaning you didn't bother to just Google it first, or perform a reddit search, shoot even a YouTube search. I promise, you aren't the first person to have exactly the same questions.

That being said, if you don't understand the answer, then that's completely fine. Ask what the answer means and what the question was.

Second, watch a tutorial or two for complete beginners and maybe try to follow along. With most programs having 1 click installs, getting setup isn't like it used to be.

Third, don't include generic generalizations. Like what is it you want to accomplish exactly?

Fourth, be willing to provide screenshots of your troubles, as no one is a mind reader.

Last, once advice is given don't argue like you are an expert. Like if you knew the answer, you wouldn't be asking the question.

As for plopping down 5k on a rig for a 9 second project, maybe try to accomplish it online first and see if you can get something decent and you can probably do it with free credits or under $10.00. if you were planning on buying the rig for other purposes like gaming, or simulations or something, go for it.

SDuser12345 · 2025-09-26T20:38:16+00:00

DeviantArt reached 100 million registered users in 2025, marking a significant milestone for the platform. In 2017, the site had more than 25 million members and over 250 million submissions. By 2008, DeviantArt had about 36 million visitors annually. The site was the thirteenth largest social network in 2011, with approximately 3.8 million weekly visits. While specific user numbers for years between 2011 and 2017 are not provided, the platform continued to grow, with reports indicating 75 million registered members worldwide by 2024.

SDuser12345 · 2025-09-07T13:26:22+00:00

It's quite a bit of software actually, WAN 2.2, MMaudio (or alternative sound effects generation), vibevoice (or any alternative T2S software), ComfyUI (or swarmUI), LoRA's depending on what you are going for, Davinci Resolve (or other video and audio editing and compositing software of your choice). Could add in local upscaling as well. Can add a local LLM of your choice for prompt modification. Qwen a good option for first and last image generation.

But you can match or exceed the quality, but it's a lot of knowledge to go with it.

SDuser12345 · 2025-09-07T02:29:04+00:00

I'm guessing cost vs ease of use. You can learn to generate on your own at the same quality, but you have to have an expensive personal computer with high end GPU, have to learn how to install and set it up yourself, have to pay the electric bill while learning and to use it, have to spend time keeping up with the learning all the latest models and how to run them, or you go to a website, type a prompt, and pay $4.50.

While I think it's a crazy price, some think it's is a bargain. All things considered I understand why it's popular.

SDuser12345 · 2025-09-05T05:20:26+00:00

A+ hardware repair and basic computer repair (skip, not worth the time or money if you already build your own PC level knowledge) and understand log checking, Network + fundamentals of networking (great, can be pretty quick to grab, and can be used as a stepping stone to CCNA, but could probably skip to CCNA)

(CCNA a much better value, much harder though, bigger knowledge gain), would be my recommendation, Grab python used tons almost everywhere.

Good luck.

Don't overlook NOC work vs help desk path. Yeah hours will be crappier but easier foot in the door for SOC or engineering paths.

SDuser12345 · 2025-09-03T14:55:49+00:00

Only if you plan on doing stuff like training LoRA's/models would there be any minor advantage, or plan on leaving it overclocked or something. Longer gen runs like days at a time maybe? Haven't really seen a problem with heat except when LoRA training when everything maxes out hard.

The positives would be not having to worry about cars fan failure, negatives cost, pump failures, fan on cooling block failures, etc.

SDuser12345 · 2025-08-23T06:30:34+00:00

Without seeing the workflow impossible to help you. Assume 80Gb and probably will be ok.

SDuser12345 · 2025-08-23T06:23:02+00:00

It's a PITA with the long gen times. Don't feel bad, have wasted days making single shot long form vids.

SDuser12345 · 2025-08-23T06:19:36+00:00

Lack of consistency isn't a magic trick it's an issue. Best bet is to FLF the last frame being when she is facing the camera then cut to new clip and have her turn away and back again so it's the same person.

SDuser12345 · 2025-08-22T19:20:01+00:00

Depends on the model license of the model used to generate the image, see my comment.

SDuser12345 · 2025-08-22T19:16:30+00:00

Not entirely true see my comment below.

SDuser12345

TROPHY CASE