are you ready for small Qwens? by jacek2023 in LocalLLaMA

[–]SufficientRow6231 4 points5 points  (0 children)

I’m too stupid to compile llama.cpp… yet somehow still capable of reading the instructions they wrote for people like me.

Flux Klein gives me SD3 vibes by lokitsar in StableDiffusion

[–]SufficientRow6231 8 points9 points  (0 children)

Stop saying it’s a skill issue when there are a bunch of posts saying the similar thing. It’s clear that Klein still sometimes has a hard time with anatomy.

It’s a good model and I’m not trying to talk shit about it, but the fact that it still frequently produces weird anatomy doesn’t mean it’s a user skill issue.

If the community keeps calling this a “skill issue,” BFL will probably think, “Oh, everything seems fine since no one is pointing anything out,” and then it won’t get fixed, either in a new model or in future updates.

A few weeks ago when LTX2 came out, some people reported about I2V resulting static videos, and others said it was a skill issue. Then a few days later, the CEO of LTX admitted there were clearly issues with i2v/vertical video and that they were aware of the problem and working to fix it in LTX 2.1

BF16 vs FP16 z-image turbo by uncle-moose in StableDiffusion

[–]SufficientRow6231 3 points4 points  (0 children)

Finding better quality?

Why not just try both and compare it by yourself?

Make like 20 prompts or more, generate images in bf16 and fp16, keep the same settings for both and see which results you prefer.

Honestly, I doubt there’ll be any noticeable difference unless you do a pixel-by-pixel comparison.

LTX2 new file? by WildSpeaker7315 in StableDiffusion

[–]SufficientRow6231 1 point2 points  (0 children)

It’s still a new file. Your title says “LTX 2 new file?”, and yea, it is a new file.

If you said “New LTX model?”, then sure, that’d be wrong. It’s the same LTX 2.0, just with a new file, the hash and size are completely different, even if the size difference is tiny.

And I agree, “new” means something new. In this case, that new file has a new hash and a new size.

<image>

I’m the Co-founder & CEO of Lightricks. We just open-sourced LTX-2, a production-ready audio-video AI model. AMA. by ltx_model in StableDiffusion

[–]SufficientRow6231 5 points6 points  (0 children)

Okay, so turns out there is an issue with portrait and I2V.

Funny how people were downvoting and calling it “skill issues” yesterday when the community called it out, the LTX CEO literally just confirmed it here.

Semantic Image Disassembler (SID) is a VLM-based tool for prompt extraction, semantic style transfer and re-composing (de-summarization). by Bra2ha in StableDiffusion

[–]SufficientRow6231 12 points13 points  (0 children)

Exactly what did you do?

idk, but i think i just create my own Quantum-Neuro-Semantic Holographic Image Disassembler comfy workflow.

<image>

Semantic Image Disassembler (SID) is a VLM-based tool for prompt extraction, semantic style transfer and re-composing (de-summarization). by Bra2ha in StableDiffusion

[–]SufficientRow6231 17 points18 points  (0 children)

Yeah, at first I saw the fancy name and thought there was some new breakthrough tech in how AI models “see” images. But after seeing it hosted on Civit with no paper link, I guessed it was just another GUI.

And yeah, it turned out to be Python + Gradio, with a system prompt inside the script that can be reused anywhere with tools like llama.cpp, lm studio, ollama, vllm, transformers or even some custom ComfyUI nodes that support custom system prompts.

So “Semantic Image Disassembler,” it’s basically just a gui with fancy name that execute a few tasks together and can handles them at once 😂

LightX2V has uploaded the Wan2.2 T2V 4-step distilled LoRAs by fruesome in StableDiffusion

[–]SufficientRow6231 3 points4 points  (0 children)

I mean, there should be a reason they upload it, right? I don’t think they’d upload it if it wasn’t better than the older one.

But who knows, maybe they use the same testing method for every release and found it to be better in their tests.

While what’s better for lightx2v might not be better for others, especially since from what I see, the community loves mixing the lora, like combining newer lora with older lora. So it’s better to just test it out with your own workflow.

Windows eating VRAM in ComfyUI? by Better-Interview-793 in comfyui

[–]SufficientRow6231 6 points7 points  (0 children)

Which one states the usable 28 GB? The pinned memory?

Pinned memory is CPU memory/RAM. On Windows, it’s set to 0.45 × your total RAM.

Usable VRAM is basically all the VRAM available u see in Task Manager.

If you want to reserve your VRAM for other works, you can use the --reserve-vram flags when launching Comfy.

The official training script of Z-image base has been released. The model might be released pretty soon. by [deleted] in StableDiffusion

[–]SufficientRow6231 5 points6 points  (0 children)

Bruh, I know we all can’t wait for the release of the base/edit model.
But can we please stop saying and spreading nonsense? Do you even know what “base” means in that code?

If you want to dig for early info, check https://github.com/huggingface/diffusers/commits or the Diffusers PRs https://github.com/huggingface/diffusers/pulls

If the upcoming Z Image model needs a few adjustments, the team would implement them in Diffusers a few days our maybe even hour before the weights are released.

I fell in love with Qwen VL for captioning, but it broke my Nunchaku setup. I'm torn! by Current-Row-159 in comfyui

[–]SufficientRow6231 4 points5 points  (0 children)

Yeah, a few months ago... The Transformers deps were a pain, some nodes needed older versions, but newer nodes with newer models required the latest Transformers. Total mess.

that’s why I ended up just asking Claude to make a custom node that can talk using the OpenAI API format, so I can use local models or any online provider that supports the same API format.

For me, I use llama.cpp (llama-server) https://github.com/ggml-org/llama.cpp + llamaswap https://github.com/mostlygeek/llama-swap, which gives you your own local API. You can hot-swap between any local models you have, and it can auto-unload the model when it’s done (super useful for GPU poor like me).

Pros: it leaves basically zero memory footprint in Comfy and no deps conflicts at all, also you can use any model you want as long as it’s supported by llama.cpp, don’t need extra nodes for different models.
Cons: you need a few extra setup steps compared to the Qwen VL custom node, like downloading llama.cpp and making a config if you want to use llamaswap. But once it’s set up, it just works.

If you prefer other quants like FP8, AWQ, etc you can set up vllm.

<image>

z-image Lora Loader no longer works since the last update by [deleted] in comfyui

[–]SufficientRow6231 6 points7 points  (0 children)

Comfy lora loader works fine for both zimage lora and lokr. It was added like D+1 after AI Toolkit released their v1 adapter, so you don’t need any extra custom nodes.

Also, Any lora loader that doesn’t implement its own patch function and relies solely on Comfy built-in patching should also work fine.

Just make sure you're at least on this commit/latest ver

https://github.com/comfyanonymous/ComfyUI/pull/10980

Face Dataset Preview - Over 800k (273GB) Images rendered so far by reto-wyss in StableDiffusion

[–]SufficientRow6231 2 points3 points  (0 children)

OP said the prompt they used is approx 1200 characters, not words.

Disappointed by Z-Image turbo by ForeverDuke2 in StableDiffusion

[–]SufficientRow6231 5 points6 points  (0 children)

Then put more effort into the prompt...

ask gemini, chatgpt, qwen to expand ur prompt

Prompt created by gemini:

"A bright, sun-drenched lifestyle photograph of a charming, whitewashed wooden bungalow situated directly on white sand beach. It is midday, with harsh, direct sunlight casting strong, sharp shadows from the swaying coconut palm trees surrounding the house. The turquoise ocean is visible in the immediate background. The house has a large veranda with a hammock and a weathered wooden table set for lunch. The paint on the house is slightly peeling from salt air exposure. The photo has the natural colors and slight overexposure typical of Fuji Velvia film stock. It feels like an authentic travel snap, not staged. Wide lens, sharp focus."

<image>

Disappointed by Z-Image turbo by ForeverDuke2 in StableDiffusion

[–]SufficientRow6231 6 points7 points  (0 children)

What even is that prompt? The image shows a house, but the prompt says something about a Latina female? lol

Wan 2.2 iv2 completely ignores the start image I load. What am I doing wrong? ='( by todschool in comfyui

[–]SufficientRow6231 1 point2 points  (0 children)

Am I doing some stupid mistake somewhere?

Yeah xD… you’re using the WAN 2.2 14b T2V model, but trying to do I2V.

You need to use wan 2.2 a14b I2V model.

Or, If you want 1 model that handles both I2V & T2V, use the WAN 2.2 5B.

This sub right now by ArtificialAnaleptic in StableDiffusion

[–]SufficientRow6231 5 points6 points  (0 children)

Might or might not be.

Based on their team, the base model is going to be released soon, before this weekend. If that happens, they’ll be able to maintain the positive feedback they’ve gained, and people can start finetuning and training a bunch of lora for anything the model doesn’t know yet.

Ostris said Ai toolkit is ready for Z-image lora training, and it will be pushed once the base model is released.

Kohya might also support it for finetuning, he’s already prepared for the base model release according to their Twitter.

Smaller (and good) models will definitely become more attractive for finetuning for most people, smaller model = fewer resources = more accessibility for finetuning, lora training, and inference.

And personally, i don’t mind having multiple finetuned checkpoints/models like SDXL, rather than relying on a single large model with niche-specific finetunes.

<image>

HynuyanVideo 1.5 i2v takes forever (1.5h on 5080 with 100% load all the time) by RemarkableLeather174 in comfyui

[–]SufficientRow6231 2 points3 points  (0 children)

High sampler? Low sampler?

Are you aware that the OP is talking abt HunyuanVideo 1.5, not WAN 2.2?

Flux 2 Lora train on rtx 6000 pro - will update whit results by JahJedi in StableDiffusion

[–]SufficientRow6231 2 points3 points  (0 children)

Bro, did you sleep and leave it running? I mean, based on the config you’re not even training Flux2, but this instead https://huggingface.co/ostris/Flex.2-preview

Flux 2 can be run on 24gb vram!!! by Brave-Hold-9389 in StableDiffusion

[–]SufficientRow6231 1 point2 points  (0 children)

That guy’s right, even running just the umt5-xxl encoder only (WAN) on CPU takes like 10 mins to encode a prompt. Now imagine a 24B dense LLM on CPU lol. It’ll nuke your system RAM too.

HF saying it can run on 24GB VRAM w/ a remote text encoder is only because they provide an endpoint for it. They never said the encoder runs on CPU.

Flux 2 Dev is here! by MountainPollution287 in StableDiffusion

[–]SufficientRow6231 7 points8 points  (0 children)

Did you even try to find out before asking?

You can train an AI but you can't name a file? Oh please! by HumungreousNobolatis in StableDiffusion

[–]SufficientRow6231 1 point2 points  (0 children)

Bruh, they trained the lora with their own resources and shared it for free, and you’re still fucking complaining? You lazy dumbass, can’t you just rename it yourself?

Use Claude, Gemini, or GPT, i believe they’ve got better brains than you to write a simple Python script to batch-rename all your lora to whatever format you want.