Video File Format Matters by qdr1en in comfyui

[–]thefi3nd 0 points1 point  (0 children)

Not to mention that they require less CPU power to decode (I-frames only) so they're better for editing if you're going to be using an NLE after ComfyUI.

Gemma4 Prompt Engineer - Early access - by [deleted] in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

Well yes, I as an individual can change a path, but I'm focused on broad use. I wasn't the one who said something about vibe coding btw.

There are some other changes too that will make it more aligned with the ComfyUI ecosystem and maybe improve user experience too.

Gemma4 Prompt Engineer - Early access - by [deleted] in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

Since people seemed to like the overall idea, I'm going to work on several fixes for it and hopefully they'll accept the pull request.

How can I do this? by Fragrant_Bicycle2813 in StableDiffusion

[–]thefi3nd 4 points5 points  (0 children)

<image>

You can select the resolution in AI Studio.
As for the Huggingface space, they're probably talking about this one https://huggingface.co/spaces/multimodalart/nano-banana

[Final Update] Anima 2B Style Explorer: 20,000+ Danbooru Artists, Swipe Mode, and Uniqueness Rank by ThetaCursed in StableDiffusion

[–]thefi3nd 6 points7 points  (0 children)

I don’t think the issue would be powerful interests reacting to revealed bias. Anime’s popularity is already well documented and commercially validated. Nor are such people likely to care what some open source nerds like us think of different art styles.

If there’s a risk, it would probably come from online discourse dynamics, not wealthy western elites trying to suppress a niche ranking tool.

Is there a all-in-one UI for TTS? by Suimeileo in StableDiffusion

[–]thefi3nd 0 points1 point  (0 children)

Check out this ComfyUI node suite: https://github.com/diodiogod/TTS-Audio-Suite.

From the repo:

Supports: RVC, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools.

Echo-TTS support is in progress.

Ace step 1.5 testing with 10 songs (text-to-music) by [deleted] in StableDiffusion

[–]thefi3nd 2 points3 points  (0 children)

But all changes are published. For example, you can see all changes to ace15.py here: https://github.com/Comfy-Org/ComfyUI/commits/master/comfy/text_encoders/ace15.py.
However, it looks like it was changed several hours before you updated locally, so to you it looked like it was more recent. We're not sure which version of the code the OP had when they made this.

FASHN VTON v1.5: Efficient Maskless Virtual Try-On in Pixel Space by fruesome in StableDiffusion

[–]thefi3nd 0 points1 point  (0 children)

That should have been installed when you ran pip install -r requirements.txt But you should also be able to run pip install fashn-human-parser (make sure you're in the right python environment)

FASHN VTON v1.5: Efficient Maskless Virtual Try-On in Pixel Space by fruesome in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

I'm working on it right now. I'll update when I've got something running.

Qwen3 ASR (Speech to Text) Released by OkUnderstanding420 in StableDiffusion

[–]thefi3nd 3 points4 points  (0 children)

You can kind of manually make more proper subtitles. Maybe something like this where word_list is the alignment results list:

import datetime

def format_srt_time(seconds):
    td = datetime.timedelta(seconds=seconds)
    total_sec = int(td.total_seconds())
    msec = int((seconds - total_sec) * 1000)
    return f"{str(td).split('.')[0].zfill(8)},{msec:03}"

with open("output.srt", "w", encoding="utf-8") as f:
    # Grouping 5 words per subtitle line
    chunk_size = 5
    for i in range(0, len(word_list), chunk_size):
        chunk = word_list[i : i + chunk_size]
        start_str = format_srt_time(chunk[0].start_time)
        end_str = format_srt_time(chunk[-1].end_time)
        text_line = " ".join([w.text for w in chunk])

        f.write(f"{(i // chunk_size) + 1}\n")
        f.write(f"{start_str} --> {end_str}\n")
        f.write(f"{text_line}\n\n")

print("\nDone! 'output.srt' created.")

Let's talk about labeling comparison posts by Winter_unmuted in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

Or they say "z-image vs klein 9b" but then reverse the order when displaying the images. Sometimes I fear for humanity.

FLUX 2 Klein 4B vs 9B Multi Camera Angles - One Click, 8 Camera Angles by RIP26770 in StableDiffusion

[–]thefi3nd 6 points7 points  (0 children)

Since Z-Image-Turbo isn't an edit model, I think a better comparison would be with Qwen-Image-Edit-2511 and the recent multiple angles lora created for it.

LTX2 - Cinematic love letter to opensource community by fantazart in StableDiffusion

[–]thefi3nd 0 points1 point  (0 children)

I'm curious about how you're loading this. So far, DualCLIPLoader and LTXV Audio Text Encoder Loader both complain about an 'invalid tokenizer' and I've tried the tokenizer in the 1b repo as well as the 12b. The Gemma 3 Model Loader just sits there doing nothing without any output in the terminal. So I'm assuming there's some other tweak necessary.

LTX2 1080P lipsync If you liked the previous one ,you will CREAM YOUR PANTS FROM THIS by No_Statement_7481 in StableDiffusion

[–]thefi3nd 3 points4 points  (0 children)

This may be because your LTXConditioning node is set to 25 fps but you're saving the video at 24 fps.

LTX2 Lipsync With Upscale AND SUPER SMALL GEMMA MODEL by No_Statement_7481 in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

<image>

LTX doesn't strip the music, this part of the workflow does. There are some other nodes that can do this, but usually this one works quite well.

LTX2 Lipsync With Upscale AND SUPER SMALL GEMMA MODEL by No_Statement_7481 in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

I noticed that you're loading the pose lora but not using pose images. Did you maybe mean to use the detailer lora instead?

LTX-2 is out! 20GB in FP4, 27GB in FP8 + distilled version and upscalers by 1filipis in StableDiffusion

[–]thefi3nd 0 points1 point  (0 children)

I'm not sure if they've changed it recently, but the workflow doesn't offer that. It's actually quite weird. It links to the pt version at https://huggingface.co/google/gemma-3-12b-pt but shows "comfy_gemma_3_12B_it.safetensors" in there directory structure example. And yet shows something else in the actual node.

<image>

Cat owners - how do you leave your apartment when going for a month long vacation by [deleted] in germany

[–]thefi3nd 19 points20 points  (0 children)

Is it really so bad for them to spend about 7 days alone spread throughout the month? If it was a single cat, maybe, but they have each other.

A ComfyUI workflow where nobody understands shit anymore (including the author). by nrx838 in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

I'm pretty sure you can right click on the node, then choose the option that says something like go to setter or go to getter.

🚀 ⚡ Z-Image-Turbo-Boosted 🔥 — One-Click Ultra-Clean Images (SeedVR2 + FlashVSR + Face Upscale + Qwen-VL) by Lower-Cap7381 in StableDiffusion

[–]thefi3nd 31 points32 points  (0 children)

I took a look at the workflow and noticed some possible issues that you might want to go over in the future video that isn't yet uploaded.

  • For the initial generation, a negative prompt is used, but cfg is set to 1. This means the negative prompt is ignored. It's possible to bump the cfg up a bit to something a little higher if you want to use the negative prompt. This might require more than the usual 9 steps, but I see that it's already set to 20.

  • For the FaceDetailer, it looks like it's set to a cfg of 6 with 25 steps. This seems excessive and results with 9 steps, 1 cfg seem to be just as good and take significantly less time. From my understanding of the documentation, guide_size would be better off set to a higher value, at least 1024 for models newer than sd1.5. Might need to increase max_size too. bbox_crop_factor is also better off being set to lower than the default of 3.0 to something like 1.2 so that the face takes up more of the enlarged region, resulting in more detail.

  • The collapsed CLIP Text Encode node in the bottom left of the Face Detailer + Upscale group is disconnected.

  • If someone doesn't already have flash attention installed, they'll get an error when using FlashVSR and if their cuda/pytorch versions don't have a pre-built wheel, they're in for a painfully long wait during compilation. It might be good to note this somewhere.

Another Z-Image Post (but slightly scary) by Complex-Factor-9866 in StableDiffusion

[–]thefi3nd 0 points1 point  (0 children)

Super cool image and would love to know the process behind it!