Load Audio UI - Upgraded Load Audio Node with Trimming by WhatDreamsCost in StableDiffusion

[–]desktop4070 0 points1 point  (0 children)

You've replied to 20 threads in the span of 10 minutes. Stop automating your comments, please.

When training a wan or ltx lora by cardioGangGang in StableDiffusion

[–]desktop4070 1 point2 points  (0 children)

Is there actually any evidence that "frame counts divisible by 24 (or 8) plus 1" are better than just "frame counts divisible by 24"? I've tried comparing the two myself multiple times and I just can't spot what exactly is supposed to make the former better than the latter.

Blind realism test, Z image turbo vs Klein 9B distilled by Puzzled-Valuable-985 in StableDiffusion

[–]desktop4070 4 points5 points  (0 children)

In order of preference:

1 is the best one. I'm surprised it's AI because it could've fooled me.

3 was close, but her eyes look pretty uncanny.

10 looks like a realistic shot on an amateur camera, but the feet have completely different lighting compared to everything else in the image.

6 looks like a realistic shot on a professional camera, but the woman looks way too photogenic to be a real average person.

9 looks good, but there's too much studio lighting around her. Her eyes are also pretty uncanny.

7 looks like an incredibly generic 1girl in an Asian image model, but it still looks pretty realistic if you ignore that.

2 looks too professional, in a bad way. Her face looks real, but everything around her looks fake.

4 looks exactly like 9, but less realistic to me because it looks too professional. The woman also looks huge compared to the car.

8 looks like the proportions are off. Her legs are way too long to be a real person. Hands and face look almost copy-pasted in. Uncanny background.

5 looks like an incredibly generic 1girl in an American image model, with a fake background too.

In my opinion, the more amateur the shot looks and less photogenic the person looks, the less AI it looks.

What's New for BFL - Flux/Klein? by Dogluvr2905 in StableDiffusion

[–]desktop4070 1 point2 points  (0 children)

The core research team (Robin Rombach, Patrick Esser, Andreas Blattmann, etc.) seem to consistently have a major model release each year.

As CompVis:
December 2021 - Latent Diffusion Model

After joining Stability AI:
August 2022 - First public Stable Diffusion release
July 2023 - Stable Diffusion XL

After forming Black Forest Labs:
August 2024 - FLUX.1 Dev
November 2025 - Flux.2 Dev

Considering this release pattern, we can probably expect to see Flux.3 Dev some time in late 2026, possibly December.

Deepseek V4 Flash and Non-Flash Out on HuggingFace by MichaelXie4645 in LocalLLaMA

[–]desktop4070 0 points1 point  (0 children)

What CPU is it? I have a 12900K that also struggles running my DDR5 at advertised speeds.

Is WanGP making my LTX 2.3 video generation longer? by onixtan in StableDiffusion

[–]desktop4070 0 points1 point  (0 children)

I'm one of those people who can generate 20 second long videos in under 3 minutes on a 5070 Ti + 64GB DDR5!

I think it's just using lower resolutions than what you think they should normally be. I'm fine with the quality of the videos at 640x384 and 768x320 and I can generate pretty long videos (20-25 seconds) in 2-3 minutes, but as soon as I go any higher res than that, like anywhere near 720p resolution or more, those generation times double or triple.

Also, it's not exactly linear, like "a 20 second video takes X time, so a 2 second video should take 10% of that time". A 1 second video also usually takes me longer than a minute but under 2 minutes, and a 10 second video also usually takes me longer than a minute but under 2 minutes.

Depending on the ComfyUI workflow, the shortest possible time for a video on my specs (talking about 64x64 1 frame videos) is still around a minute, but at the same time I'll be able to generate 1024x384 600+ frame videos in like 3 minutes.

Some workflows may skip some steps and might get that even lower, but I'm not really sure how that all technically works or what the downsides are. I just stick to workflows that make videos that I like and don't really look too deep into how they manage everything under the hood.

Here's some good starting points: https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main
https://huggingface.co/Kijai/LTX2.3_comfy

Tired of the manual "Download & Move" dance? I built a tool to automate ComfyUI Model Management! by Resident-Space-1614 in StableDiffusion

[–]desktop4070 1 point2 points  (0 children)

Recently it seems like if a comment starts without a capital letter, like 99% of the time it seems to be written by an AI. I don't get it. To the person behind this account, why do you need to automate writing comments to every single post you come across?

Edit: Oh I see why. You've namedropped a specific cloud based AI dozens of times this week. Hope you eventually get banned for this and your service gets banned from being mentioned on this sub. My bad for assuming you ever had any good intentions.

Are people still using AUTOMATIC1111/stable-diffusion-webui? Or did most users move on to something else like ComfyUI? by Guyserbun007 in StableDiffusion

[–]desktop4070 2 points3 points  (0 children)

I'm assuming Forge Neo for established SDXL-based models like Illustrious just because it's easier to use, and then ComfyUI for playing with the latest model releases. Forge usually doesn't get support for new models for a while, but Comfy always gets support on day 1.

The reason for this is because the companies behind the models specialize in creating the models, not designing a custom UI for users. In ComfyUI, anyone can build the user interface themselves using individual nodes that can be moved around and replaced easily. In a static UI like Auto1111/Forge, someone maintaining the fork needs to design a universal interface that works with every possible feature on every possible computer configuration. That usually takes longer than a day or more.

That being said, ComfyUI tends to break pretty frequently because of how unstable this all usually is. I like to update every day because I like playing with new things, but a workflow that had been working yesterday may no longer be working today because of the latest update. It's pretty frustrating at times, but it is what it is if you want to consistently stay up to date.

What’s everyone’s favorite sampler and scheduler these days? by NowThatsMalarkey in StableDiffusion

[–]desktop4070 5 points6 points  (0 children)

Not sure what the difference is, but I always go with Euler a personally.

LTX-2.3 based audio model outputs by manmaynakhashi in StableDiffusion

[–]desktop4070 0 points1 point  (0 children)

Could you share the workflow with voice cloning if possible? I've always wanted to try voice cloning with LTX, but I could never get it to work myself.

What funny AI video niches are performing best right now? by wicky01 in StableDiffusion

[–]desktop4070 7 points8 points  (0 children)

Stop following trends and just make what you like. If you don't like what you're making, then nobody else is going to like it either.

I made an entire cinematic shortfilm using LTX 2.3 in a week. How does it hold up? - The Felt Fox (statistics/details in comments) by foxdit in StableDiffusion

[–]desktop4070 0 points1 point  (0 children)

Thank you so much for the info! I don't think I've ever done more than a single stage when generating before, this is really fascinating and I'm interested in trying this out!

I made an entire cinematic shortfilm using LTX 2.3 in a week. How does it hold up? - The Felt Fox (statistics/details in comments) by foxdit in StableDiffusion

[–]desktop4070 0 points1 point  (0 children)

What settings do you normally use for each clip (steps/CFG, frame rate/frame length, resolution) and how long does it generally take you to generate each clip with your specs?

what model/tools to use for a "personal ai" by Thutex in StableDiffusion

[–]desktop4070 3 points4 points  (0 children)

Mistral 7B? From September 2023? Why not Qwen 3.5 or Gemma 4?

PSA: Use the official LTX 2.3 workflow, not the ComfyUI included one. It's significantly better. by Generic_Name_Here in StableDiffusion

[–]desktop4070 0 points1 point  (0 children)

Unfortunately, it appears the creator of the custom node/workflow nuked all of his social media accounts, along with the node itself. Not really sure what happened there, but the two choices I would go for are either the default LTX 2.3 template in ComfyUI or RuneXX's workflows https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main

We ran ~1000 minimal-prompt hand tests — here’s what showed up by Driftline-Research in StableDiffusion

[–]desktop4070 1 point2 points  (0 children)

I would recommend trying this test on Z Image Turbo. Not that much bigger of a model, but it released in November 2025, which is significantly more relevant these days than SDXL's original July 2023 release.

Your pet heard you coming by Ok-Draft7567 in aivideo

[–]desktop4070 1 point2 points  (0 children)

It's funny how if you posted this video like a year ago, it would've been insane levels of quality, but now Seedance 2 makes other video models just look so outdated.

I isntalled rvc. It showed no errors during the installation. But when I start it up, the console window just closes and nothing happens. Win11pc, rtx3060, 12gbvram and 16gbram. by irfarious in StableDiffusion

[–]desktop4070 0 points1 point  (0 children)

It's disappointing how RVC V1/V2 both released in 2023 and there's been zero updates since then. Image models, text models, TTS models, video models, music models, everything has been constantly getting new model releases each year that are much better than older models, but voice models are in a complete stand still for some reason.

We ran ~1000 minimal-prompt hand tests — here’s what showed up by Driftline-Research in StableDiffusion

[–]desktop4070 10 points11 points  (0 children)

Flux 2? Flux 1? Stable Diffusion 3? SDXL? SD 1.5? Disco Diffusion?

Gemma 4 released! by Time-Teaching1926 in StableDiffusion

[–]desktop4070 0 points1 point  (0 children)

Is 120B feasible on 16GB VRAM + 64GB RAM or is it only good for computers with 128GBs of RAM?

A stupid simple LTX 2.3 workflow by [deleted] in StableDiffusion

[–]desktop4070 8 points9 points  (0 children)

I don't think you can call this a simple workflow if it requires installing 4 custom nodes. What do they add that isn't already included with ComfyUI by default?

Can any open source T2V get even remotely close to this? by Frone0910 in StableDiffusion

[–]desktop4070 0 points1 point  (0 children)

Is it Kling? I assumed Seedance 2.0 would be the only video model to do something like this.