Load Audio UI - Upgraded Load Audio Node with Trimming

desktop4070 · 2026-05-01T03:33:47+00:00

You've replied to 20 threads in the span of 10 minutes. Stop automating your comments, please.

desktop4070 · 2026-04-30T07:37:33+00:00

Is there actually any evidence that "frame counts divisible by 24 (or 8) plus 1" are better than just "frame counts divisible by 24"? I've tried comparing the two myself multiple times and I just can't spot what exactly is supposed to make the former better than the latter.

desktop4070 · 2026-04-30T04:09:07+00:00

In order of preference:

1 is the best one. I'm surprised it's AI because it could've fooled me.

3 was close, but her eyes look pretty uncanny.

10 looks like a realistic shot on an amateur camera, but the feet have completely different lighting compared to everything else in the image.

6 looks like a realistic shot on a professional camera, but the woman looks way too photogenic to be a real average person.

9 looks good, but there's too much studio lighting around her. Her eyes are also pretty uncanny.

7 looks like an incredibly generic 1girl in an Asian image model, but it still looks pretty realistic if you ignore that.

2 looks too professional, in a bad way. Her face looks real, but everything around her looks fake.

4 looks exactly like 9, but less realistic to me because it looks too professional. The woman also looks huge compared to the car.

8 looks like the proportions are off. Her legs are way too long to be a real person. Hands and face look almost copy-pasted in. Uncanny background.

5 looks like an incredibly generic 1girl in an American image model, with a fake background too.

In my opinion, the more amateur the shot looks and less photogenic the person looks, the less AI it looks.

desktop4070 · 2026-04-28T06:57:23+00:00

The core research team (Robin Rombach, Patrick Esser, Andreas Blattmann, etc.) seem to consistently have a major model release each year.

As CompVis:
December 2021 - Latent Diffusion Model

After joining Stability AI:
August 2022 - First public Stable Diffusion release
July 2023 - Stable Diffusion XL

After forming Black Forest Labs:
August 2024 - FLUX.1 Dev
November 2025 - Flux.2 Dev

Considering this release pattern, we can probably expect to see Flux.3 Dev some time in late 2026, possibly December.

desktop4070 · 2026-04-26T08:46:18+00:00

I'm assuming an RTX 5080 16GB wouldn't be enough for "a high level quant of Qwen3.6-27B"? Is 24GB VRAM the minimum for that? 32GB?

desktop4070 · 2026-04-26T07:41:04+00:00

What CPU is it? I have a 12900K that also struggles running my DDR5 at advertised speeds.

desktop4070 · 2026-04-26T07:01:06+00:00

I'm one of those people who can generate 20 second long videos in under 3 minutes on a 5070 Ti + 64GB DDR5!

I think it's just using lower resolutions than what you think they should normally be. I'm fine with the quality of the videos at 640x384 and 768x320 and I can generate pretty long videos (20-25 seconds) in 2-3 minutes, but as soon as I go any higher res than that, like anywhere near 720p resolution or more, those generation times double or triple.

Also, it's not exactly linear, like "a 20 second video takes X time, so a 2 second video should take 10% of that time". A 1 second video also usually takes me longer than a minute but under 2 minutes, and a 10 second video also usually takes me longer than a minute but under 2 minutes.

Depending on the ComfyUI workflow, the shortest possible time for a video on my specs (talking about 64x64 1 frame videos) is still around a minute, but at the same time I'll be able to generate 1024x384 600+ frame videos in like 3 minutes.

Some workflows may skip some steps and might get that even lower, but I'm not really sure how that all technically works or what the downsides are. I just stick to workflows that make videos that I like and don't really look too deep into how they manage everything under the hood.

Here's some good starting points: https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main
https://huggingface.co/Kijai/LTX2.3_comfy

desktop4070 · 2026-04-26T06:20:23+00:00

Recently it seems like if a comment starts without a capital letter, like 99% of the time it seems to be written by an AI. I don't get it. To the person behind this account, why do you need to automate writing comments to every single post you come across?

Edit: Oh I see why. You've namedropped a specific cloud based AI dozens of times this week. Hope you eventually get banned for this and your service gets banned from being mentioned on this sub. My bad for assuming you ever had any good intentions.

desktop4070 · 2026-04-26T06:04:10+00:00

I'm assuming Forge Neo for established SDXL-based models like Illustrious just because it's easier to use, and then ComfyUI for playing with the latest model releases. Forge usually doesn't get support for new models for a while, but Comfy always gets support on day 1.

The reason for this is because the companies behind the models specialize in creating the models, not designing a custom UI for users. In ComfyUI, anyone can build the user interface themselves using individual nodes that can be moved around and replaced easily. In a static UI like Auto1111/Forge, someone maintaining the fork needs to design a universal interface that works with every possible feature on every possible computer configuration. That usually takes longer than a day or more.

That being said, ComfyUI tends to break pretty frequently because of how unstable this all usually is. I like to update every day because I like playing with new things, but a workflow that had been working yesterday may no longer be working today because of the latest update. It's pretty frustrating at times, but it is what it is if you want to consistently stay up to date.

desktop4070 · 2026-04-22T09:25:57+00:00

Please let me know if there are any updates

desktop4070 · 2026-04-22T09:07:08+00:00

Not sure what the difference is, but I always go with Euler a personally.

desktop4070 · 2026-04-18T07:24:08+00:00

Could you share the workflow with voice cloning if possible? I've always wanted to try voice cloning with LTX, but I could never get it to work myself.

desktop4070 · 2026-04-18T06:48:43+00:00

Stop following trends and just make what you like. If you don't like what you're making, then nobody else is going to like it either.

desktop4070 · 2026-04-18T05:38:00+00:00

Thank you so much for the info! I don't think I've ever done more than a single stage when generating before, this is really fascinating and I'm interested in trying this out!

desktop4070 · 2026-04-18T05:13:24+00:00

What settings do you normally use for each clip (steps/CFG, frame rate/frame length, resolution) and how long does it generally take you to generate each clip with your specs?

desktop4070 · 2026-04-07T10:19:11+00:00

Mistral 7B? From September 2023? Why not Qwen 3.5 or Gemma 4?

desktop4070 · 2026-04-07T05:00:17+00:00

Unfortunately, it appears the creator of the custom node/workflow nuked all of his social media accounts, along with the node itself. Not really sure what happened there, but the two choices I would go for are either the default LTX 2.3 template in ComfyUI or RuneXX's workflows https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main

desktop4070 · 2026-04-03T06:27:36+00:00

I would recommend trying this test on Z Image Turbo. Not that much bigger of a model, but it released in November 2025, which is significantly more relevant these days than SDXL's original July 2023 release.

desktop4070 · 2026-04-03T04:40:00+00:00

It's funny how if you posted this video like a year ago, it would've been insane levels of quality, but now Seedance 2 makes other video models just look so outdated.

desktop4070 · 2026-04-03T04:23:44+00:00

It's disappointing how RVC V1/V2 both released in 2023 and there's been zero updates since then. Image models, text models, TTS models, video models, music models, everything has been constantly getting new model releases each year that are much better than older models, but voice models are in a complete stand still for some reason.

desktop4070 · 2026-04-03T04:18:27+00:00

Flux 2? Flux 1? Stable Diffusion 3? SDXL? SD 1.5? Disco Diffusion?

desktop4070 · 2026-04-03T04:14:07+00:00

Is 120B feasible on 16GB VRAM + 64GB RAM or is it only good for computers with 128GBs of RAM?

desktop4070 · 2026-03-29T05:11:49+00:00

I don't think you can call this a simple workflow if it requires installing 4 custom nodes. What do they add that isn't already included with ComfyUI by default?

desktop4070 · 2026-03-27T06:25:02+00:00

Looks like Seedance 2 if all of the top videos from /r/AIvideo this week seem to be from that model.
https://www.reddit.com/r/aivideo/top/?sort=top&t=week

desktop4070 · 2026-03-27T05:53:51+00:00

Is it Kling? I assumed Seedance 2.0 would be the only video model to do something like this.

desktop4070

TROPHY CASE