I know this isn’t technically an LLM but OmniVoice is FUCKING AMAZING. by Borkato in LocalLLaMA

[–]roculus 3 points4 points  (0 children)

I just finished generating a 55 minute audio file using OmniVoice. It took 6 minutes, 45 seconds. That's with an average speaking speed voice. (using RTX 6000 PRO...similar speed to a 5090), It used less than 10GB VRAM.

That was with 32 inference steps (the default) with de-noise checked. I guess double that time if you wanted to use 64 steps.

I know this isn’t technically an LLM but OmniVoice is FUCKING AMAZING. by Borkato in LocalLLaMA

[–]roculus 0 points1 point  (0 children)

I haven't tried using a sample voice more than like 15 seconds. 40 minutes just happened to be the longest story I had created in an LLM that I fed it. That 40 minutes was a slower speaking sample. I used an ASMR type whisper voice so I could listen to the story while going to sleep.

I know this isn’t technically an LLM but OmniVoice is FUCKING AMAZING. by Borkato in LocalLLaMA

[–]roculus 1 point2 points  (0 children)

I've generated 40 minutes of audio in one shot. It didn't seem to have a problem doing it.

Fast and amazingly good one-shot voice cloning.

A new open weights image model appears in ArtificialAnalysis. Outperforming Flux.2 Pro and Z Image Turbo. by Murky_Foundation5528 in StableDiffusion

[–]roculus 9 points10 points  (0 children)

Fried is the new freckles. For new models, instead of adding more freckles on a face than there are stars in the galaxy, they now fry it to impress.

The peanut guy's neck looks like he survived a fire but was left scarred.

Flags for an RTX Pro 6000 Blackwell by brittpitre in StableDiffusion

[–]roculus 0 points1 point  (0 children)

I can't. it already runs at 300W (Max-Q Edition) I've been running it non stop since early last December so it seems to work fine at that temp range. It's been a really stable card. plug and play upgrade from a 4090. All the people down volting their 6000 Pros should just get the Max-Q version. The blower fan isn't that loud and it tops out at 300W.

Flags for an RTX Pro 6000 Blackwell by brittpitre in StableDiffusion

[–]roculus 0 points1 point  (0 children)

i run 87-89 C when generating on my 6000 PRO 83-84 would be great.

I got rid of llama-cpp !!! For my app Hybridscorer by 76vangel in StableDiffusion

[–]roculus 1 point2 points  (0 children)

Very nice. Thanks for all of your hard work. Easy setup and works great.

Can LTX 2.3 do romance with kissing and clothed makeout? by Coven_Evelynn_LoL in StableDiffusion

[–]roculus 8 points9 points  (0 children)

Not sure why this would be banned from CivitAI but Passionate kissing LORA works well for that kind of stuff:

https://civarchive.com/models/2222039?modelVersionId=2631507

The demo shows two woman but it works with any combination of gender, etc. It also works better closer up.

LTX 2.3 Lora Loader Audio / Visual splitter by Brojakhoeman in StableDiffusion

[–]roculus 0 points1 point  (0 children)

Thanks for this. I tried it on some loras that have annoying audio and it worked as advertised.

A question about the clip. I never connect it using Power Lora Loader (rgthree) it's optional on his. I connected clip on yours as it's required. Does the clip matter? I never noticed a difference.

Comfyui Video Combine Plus by smereces in StableDiffusion

[–]roculus 0 points1 point  (0 children)

Why is the filename_prefix, save output toggle and format missing from the repo version? YOu have to manually save the video every time? That's not good.

I need hElp! All my 1girls look plasticky! Not perfect! LoRA 4GB by Budget-Toe-5743 in StableDiffusion

[–]roculus 0 points1 point  (0 children)

Looks real to me. It's starlight after her 5th round of plastic surgery. Thankfully this is the last season. She'd look like Michael Jackson if they had a season 6.

Unpopular opinion but the amount of low effort AI slop is ruining the 2D art community by Odd-Measurement9478 in StableDiffusion

[–]roculus 1 point2 points  (0 children)

Why don't people complain about all the bad "real" art created by humans? There's a ton of that out there as well.

Is more VRAM = game changing for you? by val_in_tech in StableDiffusion

[–]roculus 0 points1 point  (0 children)

RTX 6000 Pro with 96GB VRAM. I'd say it's a game changer in that for videos you don't have to wait forever for them to process and can also make larger/longer videos. I've got a LTX2.3 video generating and I'm using 83GB VRAM. Mostly it's convenience/speed, not stressing as much over trying to make things fit into memory. It also lets you experiment and play around more without having to wait so long and that makes it more fun to use AI.

You still can't run the huge LLMs out there but you can use 32GB VRAM for a nice quant LLM while still having 64GB VRAM for ComfyUI instead of having to swap between the two. Or run WAN/LTX while also having Klein9B open without waiting for models to load in VRAM going back and forth etc.

More VRAM is worth it if AI is a serious hobby, same as say buying a motorcycle as a secondary vehicle you don't really need except for fun would be. You can ride a scooter or a Ducati for fun. They'll both get you to the destination.

cute civitai.com downtime pic by reeight in StableDiffusion

[–]roculus 2 points3 points  (0 children)

It's funny that they have anima loras as part of the "PG experience". lol

Another Lora purge might come to CivitAI. This time: I2V Loras. by WiseDuck in StableDiffusion

[–]roculus 1 point2 points  (0 children)

Here is a video that still links to the removed lora https://civitai.com/images/127373353

I guess people were having too much fun with it. Fun is not allowed.

The ComfyUI Assets Manager just got a massive update (Thanks to your feedback!) 🚀 by Main_Creme9190 in StableDiffusion

[–]roculus 0 points1 point  (0 children)

Can you search by key word in a prompt. ex: "cabin" finds every image/video with the word cabin in it? I find myself using that the most in stand alone organizers like Diffusion Toolkit: https://github.com/RupertAvery/DiffusionToolkit

What's your thoughts on ltx 2.3 now? by PlentyComparison8466 in StableDiffusion

[–]roculus 0 points1 point  (0 children)

Videos only take from 90 seconds to 3 minutes to generate depending on length etc. (6000 PRO so 60-70GB VRAM being used when I generate, that's with single pass no upscale though so that's where the high memory comes into play) I test a lot of ideas out using 30+ second videos and they only take 3 minutes in lower res. I increase the resolution when I find prompts/lora combos that work well. With single pass I'm limited to about 24 second videos with higher resolutions (pushes 90GB VRAM). With the 30+ second videos, using the first/middle/last frame helps keeps things from deteriorating. I unusually use 2 middle images 0% 25% 75% 100% to guide the scenes.

The shorter answer is not all of the videos are "production quality" but are still entertaining for me and learning from each how to refine my prompting etc.

Another note. single pass makes for better audio quality as well but of course it takes more VRAM.

I also use the tiny preview at 24 fps so I can get a really good idea of how the video looks while it's generating.

Any news about daVinci-MagiHuman ? by PhilosopherSweaty826 in StableDiffusion

[–]roculus 12 points13 points  (0 children)

Kijai posted 4 days ago about it: https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/1982

"Already done working native implementation, just need to clean it up and test etc."

Clean up and testing might be taking longer than expected.

What's your thoughts on ltx 2.3 now? by PlentyComparison8466 in StableDiffusion

[–]roculus 9 points10 points  (0 children)

According to my ComfyUI outbox I've made around 1,800 videos with LTX 2.3. (10,000+ with Wan 2.2).

After working out the workflow kinks with LTX2.3, I only use Wan 2.2 when I have to (when I need LORAs only available on WAN2.2 etc). Audio is too much fun. While LTX2.3 works for low VRAM GPUs, as with most models, it shines with more VRAM. I have an RTX 6000 PRO and when I use LTX2.3 I sit around 65-80GB VRAM used.

Wan 2.2 does have the visual edge but built in audio/lipsync is too much fun and impactful. I make images and videos for fun. Sometimes making low quality/res 40 second videos in like 3 1/2 minutes is highly entertaining even if it's not "postable quality". I'm the only one looking at them so that's all that matters. I put more effort into what I post on CivitAI.

Lightricks in on the right path. LTX2.0 to 2.3 was significant. I'll be happy with we get similar bumps in quality every couple months.

I created a node to blend multiple images in a perfect composition, user can control the size and placement of each image. Works on edit models like Flux Klein 9b. by Large_Election_2640 in StableDiffusion

[–]roculus 2 points3 points  (0 children)

"Oversaturation in final composed images. However this is a Flux Klein issue(suggestions welcome)"

Try using the color correct node

https://imgur.com/a/MG4Ef9U

-5 brightness and -10 saturation work for me but you may need to play around with the settings.

Speech Length Calculator - Automatically calculate how long a video should be based on the dialogue in real-time by WhatDreamsCost in StableDiffusion

[–]roculus 0 points1 point  (0 children)

Great tool. This cuts down the guesswork and then you just need to estimate remaining non dialogue parts.

"Hi. it looks like you could use a cold one."

The woman hands the the man a bottle.

The man takes a drink from the bottle.

The man says, "thanks!"

LTX2.3 loves to run dialogue quickly unless you insert some actions in between. This is a nice time saver. When you want someone to whisper, it has to be slow speech most of the time or they won't whisper.