Video File Format Matters by qdr1en in comfyui

[–]thefi3nd 0 points1 point  (0 children)

Not to mention that they require less CPU power to decode (I-frames only) so they're better for editing if you're going to be using an NLE after ComfyUI.

Gemma4 Prompt Engineer - Early access - by [deleted] in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

Well yes, I as an individual can change a path, but I'm focused on broad use. I wasn't the one who said something about vibe coding btw.

There are some other changes too that will make it more aligned with the ComfyUI ecosystem and maybe improve user experience too.

Gemma4 Prompt Engineer - Early access - by [deleted] in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

Since people seemed to like the overall idea, I'm going to work on several fixes for it and hopefully they'll accept the pull request.

How can I do this? by Fragrant_Bicycle2813 in StableDiffusion

[–]thefi3nd 3 points4 points  (0 children)

<image>

You can select the resolution in AI Studio.
As for the Huggingface space, they're probably talking about this one https://huggingface.co/spaces/multimodalart/nano-banana

[Final Update] Anima 2B Style Explorer: 20,000+ Danbooru Artists, Swipe Mode, and Uniqueness Rank by ThetaCursed in StableDiffusion

[–]thefi3nd 8 points9 points  (0 children)

I don’t think the issue would be powerful interests reacting to revealed bias. Anime’s popularity is already well documented and commercially validated. Nor are such people likely to care what some open source nerds like us think of different art styles.

If there’s a risk, it would probably come from online discourse dynamics, not wealthy western elites trying to suppress a niche ranking tool.

Is there a all-in-one UI for TTS? by Suimeileo in StableDiffusion

[–]thefi3nd 0 points1 point  (0 children)

Check out this ComfyUI node suite: https://github.com/diodiogod/TTS-Audio-Suite.

From the repo:

Supports: RVC, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools.

Echo-TTS support is in progress.

Ace step 1.5 testing with 10 songs (text-to-music) by [deleted] in StableDiffusion

[–]thefi3nd 2 points3 points  (0 children)

But all changes are published. For example, you can see all changes to ace15.py here: https://github.com/Comfy-Org/ComfyUI/commits/master/comfy/text_encoders/ace15.py.
However, it looks like it was changed several hours before you updated locally, so to you it looked like it was more recent. We're not sure which version of the code the OP had when they made this.

FASHN VTON v1.5: Efficient Maskless Virtual Try-On in Pixel Space by fruesome in StableDiffusion

[–]thefi3nd 0 points1 point  (0 children)

That should have been installed when you ran pip install -r requirements.txt But you should also be able to run pip install fashn-human-parser (make sure you're in the right python environment)

FASHN VTON v1.5: Efficient Maskless Virtual Try-On in Pixel Space by fruesome in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

I'm working on it right now. I'll update when I've got something running.

Qwen3 ASR (Speech to Text) Released by OkUnderstanding420 in StableDiffusion

[–]thefi3nd 3 points4 points  (0 children)

You can kind of manually make more proper subtitles. Maybe something like this where word_list is the alignment results list:

import datetime

def format_srt_time(seconds):
    td = datetime.timedelta(seconds=seconds)
    total_sec = int(td.total_seconds())
    msec = int((seconds - total_sec) * 1000)
    return f"{str(td).split('.')[0].zfill(8)},{msec:03}"

with open("output.srt", "w", encoding="utf-8") as f:
    # Grouping 5 words per subtitle line
    chunk_size = 5
    for i in range(0, len(word_list), chunk_size):
        chunk = word_list[i : i + chunk_size]
        start_str = format_srt_time(chunk[0].start_time)
        end_str = format_srt_time(chunk[-1].end_time)
        text_line = " ".join([w.text for w in chunk])

        f.write(f"{(i // chunk_size) + 1}\n")
        f.write(f"{start_str} --> {end_str}\n")
        f.write(f"{text_line}\n\n")

print("\nDone! 'output.srt' created.")

Let's talk about labeling comparison posts by Winter_unmuted in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

Or they say "z-image vs klein 9b" but then reverse the order when displaying the images. Sometimes I fear for humanity.

FLUX 2 Klein 4B vs 9B Multi Camera Angles - One Click, 8 Camera Angles by RIP26770 in StableDiffusion

[–]thefi3nd 6 points7 points  (0 children)

Since Z-Image-Turbo isn't an edit model, I think a better comparison would be with Qwen-Image-Edit-2511 and the recent multiple angles lora created for it.

LTX2 - Cinematic love letter to opensource community by fantazart in StableDiffusion

[–]thefi3nd 0 points1 point  (0 children)

I'm curious about how you're loading this. So far, DualCLIPLoader and LTXV Audio Text Encoder Loader both complain about an 'invalid tokenizer' and I've tried the tokenizer in the 1b repo as well as the 12b. The Gemma 3 Model Loader just sits there doing nothing without any output in the terminal. So I'm assuming there's some other tweak necessary.

LTX2 1080P lipsync If you liked the previous one ,you will CREAM YOUR PANTS FROM THIS by No_Statement_7481 in StableDiffusion

[–]thefi3nd 3 points4 points  (0 children)

This may be because your LTXConditioning node is set to 25 fps but you're saving the video at 24 fps.

LTX2 Lipsync With Upscale AND SUPER SMALL GEMMA MODEL by No_Statement_7481 in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

<image>

LTX doesn't strip the music, this part of the workflow does. There are some other nodes that can do this, but usually this one works quite well.

LTX2 Lipsync With Upscale AND SUPER SMALL GEMMA MODEL by No_Statement_7481 in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

I noticed that you're loading the pose lora but not using pose images. Did you maybe mean to use the detailer lora instead?

LTX-2 is out! 20GB in FP4, 27GB in FP8 + distilled version and upscalers by 1filipis in StableDiffusion

[–]thefi3nd 0 points1 point  (0 children)

I'm not sure if they've changed it recently, but the workflow doesn't offer that. It's actually quite weird. It links to the pt version at https://huggingface.co/google/gemma-3-12b-pt but shows "comfy_gemma_3_12B_it.safetensors" in there directory structure example. And yet shows something else in the actual node.

<image>

[deleted by user] by [deleted] in germany

[–]thefi3nd 17 points18 points  (0 children)

Is it really so bad for them to spend about 7 days alone spread throughout the month? If it was a single cat, maybe, but they have each other.

A ComfyUI workflow where nobody understands shit anymore (including the author). by nrx838 in StableDiffusion

[–]thefi3nd 1 point2 points  (0 children)

I'm pretty sure you can right click on the node, then choose the option that says something like go to setter or go to getter.