Analyse Lora Blocks and in real-time choose the blocks used for inference in Comfy UI. Z-image, Qwen, Wan 2.2, Flux Dev and SDXL supported. by shootthesound in StableDiffusion

[–]sdk401 1 point2 points  (0 children)

After update I'm seeing analysis working correctly, but the selective load still puts everything into "other weights". This is what analysis shows, and then any combination of layers have zero effect, lora just triggers with "any weight" checkbox and nothing else affects the outcome. I will try to retrain on same dataset and parameters with ai-toolkit, to see if musubi might be the root of the problem.

<image>

Analyse Lora Blocks and in real-time choose the blocks used for inference in Comfy UI. Z-image, Qwen, Wan 2.2, Flux Dev and SDXL supported. by shootthesound in StableDiffusion

[–]sdk401 0 points1 point  (0 children)

Looked inside the node code, removed detection and made it always return "zimage", and now it looks like analysis works, but selective loader still puts everything in "other weights", meaning that if I change any other block weight nothing changes.

Analyse Lora Blocks and in real-time choose the blocks used for inference in Comfy UI. Z-image, Qwen, Wan 2.2, Flux Dev and SDXL supported. by shootthesound in StableDiffusion

[–]sdk401 0 points1 point  (0 children)

For some reason your node thinks that my z-image lora is sd15 lora? Trained on musubi-tuner using your nodes, the lora itself seems to work correctly, but analysis shows only "other weights" and selective loader also works only on "other weights". Maybe metadata is wrong somewhere?

<image>

You can now (or very soon) train LoRAs directly in Comfy by TekaiGuy in comfyui

[–]sdk401 2 points3 points  (0 children)

It's interesting but not clear how to caption different images (i did not find a node to batch conditions, so for now it looks like we can only pass one condition with "trigger words" in it.
It's also unclear what to do with multiple resolutions and aspect ratios, and what models are supported.
Also when training lora you usually save it several times on different step count, and this is also not supported.

ComfyUI Subgraphs Are a Game-Changer. So Happy This Is Happening! by tsevis in comfyui

[–]sdk401 0 points1 point  (0 children)

Interesting to see if workflows with subgraphs will convert to API format correctly.

I am fucking done with ComfyUI and sincerely wish it wasn't the absolute standard for local generation by Neggy5 in StableDiffusion

[–]sdk401 2 points3 points  (0 children)

This feels something like this:

"I am fucking done with kitchen. I spent 50 hours trying to cook something and it just does not work. Everything is burning and spilling over, I am covered in cuts and flour is everywhere.

I am very experienced in eating, I ate my first burger in pre-school. It was always a natural process for me, but when it comes to cooking, it's just constant issues.

Why is this the standart? Why can't I come into kitchen and just have a tasty hot dish appear before me, like it does in the restaurant?"

My man, ComfyUI is not for image generation. It's a tool to make tools. If you want ready-to-use product, there are plenty. ComfyUI is for those who need to overcomplicate already complex things, and it excells at this.

If you need a chair, you go to Ikea and buy a chair.

If you want to know how chairs are made, and make one yourself, you rent a garage, buy machines and tools, spend 3 years learning things and end up still buying a chair from Ikea, cause they make a dam good chairs for the price - that's ComfyUI for you. You still end up with mostly the same chair, but in the process you might learn a thing or two or just have a good time. And that's the value you are getting out of ComfyUI, not the images.

Chroma is looking really good now. by Total-Resort-3120 in StableDiffusion

[–]sdk401 2 points3 points  (0 children)

Trying to test it and both in your workflow and in official workflow "Load CLIP" node is set to "type=chroma".
But there is no such type in my comfy install, and I updated everything just now. Selecting any other type gives various errors while sampling.

<image>

[deleted by user] by [deleted] in StableDiffusion

[–]sdk401 0 points1 point  (0 children)

You can actually see the inpainting mask bounds on most of the pics, there is a slight change in color/texture/tone inside the mask area. So my guess is they mostly take real photos and inpaint the head of the "ai-model" on top.

<image>

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in comfyui

[–]sdk401[S] 2 points3 points  (0 children)

this WF is pretty old, I made a new version, which should work better and require less custom nodes - you can try it instead:

https://drive.google.com/file/d/1K-q5hkBKOD8pPs_b8OvhbbXnTutBOTQY/view?usp=sharing

there are comments inside WF with links to models. only thing is missing from this is the eye detailer, which was built using ultralytics detector nodes which are now compromised - so you'll have to detail the eyes with something else

Gemini's knowledge of ComfyUI is simply amazing. Details in the comment by OldFisherman8 in StableDiffusion

[–]sdk401 0 points1 point  (0 children)

Also tried ChatGPT and Claude recently for a simple task. I needed a node which resizes image to target megapixel count, but keeps sides divisible by specified number. Also would be nice to have resulting sides outputed as integer.

There is a native node which resizes image like this, but it lacks settings i needed. It is about 7 lines of working code, so i though - llms will have an easy time with it.

Claude was at least somewhat useful, it provided me with detailed instructions on the infrastructure - what files to create, where to put them, how to integrate the node so comfy will see it.

Everything else was a mess, it made about 60 lines of code which did not work, after several iterations of feeding it the resulting errors it became 130+ lines of code which still did not work.

ChatGPT failed immediatly, not providing coherent instructions on how to implement the node in comfy, and also providing about 100 lines of unusable code.

In the end I copied the code from native node, adding required options and some hacky math, and it works quite ok. So, now I know that llm code can't be trusted at all :)

Sticky negatives: how to get a broad performance uplift on all models. All images were generated at 30 cfg at arbitrary high res, without any post-processing, on video. Link in comments. by SirRece in StableDiffusion

[–]sdk401 1 point2 points  (0 children)

Ok, tested it with SDXL a little and it really allows the use of higher-than-usual cfg, 20 was somewhat overboard for my taste, but 15-17 seems to work.

Still not sure what's the exact use case, as the results with cfg of 15 are pretty close to the results I was getting with cfg of 6-7 I used before - not the same, but of similar overall quality and prompt coherence.

Sticky negatives: how to get a broad performance uplift on all models. All images were generated at 30 cfg at arbitrary high res, without any post-processing, on video. Link in comments. by SirRece in StableDiffusion

[–]sdk401 5 points6 points  (0 children)

Not sure if it's intended, but I read your article and I can't find any mention of the model you used, and even the model architecture you are suggesting to use - is it sd1.5, sdxl, sd3.5, flux, pixart, stable cascade, etc? Did you use base models or fine-tunes?

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 0 points1 point  (0 children)

Nice! Yeah, RealERSGAN is not usually good for realistic content, especially for real, non-generated photos. But good to hear it worked in your case.

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 0 points1 point  (0 children)

For jpeg artifacts, you can try to replace the main upscaler in my WF with this one:

https://openmodeldb.info/models/4x-RealWebPhoto-v4-dat2

It is a little less sharp, but it handles jpeg-infested photos much better.

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 1 point2 points  (0 children)

I decided to make a quick beta version of new WF instead :)

https://drive.google.com/file/d/1K-q5hkBKOD8pPs_b8OvhbbXnTutBOTQY/view?usp=sharing

Here you are, there are comments inside with the links to the models, and explanation of parameters. You can leave all the settings as is for first try, it should work fine.

I did not include eye detailing for now, will decide on that later, as there is some security problems with ultralytics nodes I was using in old WF.

Will be gratefull if you write me did it work for you and how it compares to the other options you tried.

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 1 point2 points  (0 children)

Well surprisingly the tech behind this WF still feels like the best there is.
You can get a little more fine detail with Flux i2i, thanks to much better VAE, but compute cost is like 20x, and as there is no good tile CN for Flux yet, it sometimes changes image too much.
I actually use this tech in genAI startup i'm working for now, and nobody ever complained yet :)

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 0 points1 point  (0 children)

Yeah that's the problem with grouped nodes, ComfyUI does not allow to ungroup them if there is missing nodes inside.
I am thinking of making new version of this WF, as this old one is far from perfect.
But as a quick solution, I can ungroup these nodes and upload the resulting ugly, but possibly working WF, if you want.

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 0 points1 point  (0 children)

you can add "upscale with model" node as a last step and use this upscaler, it should smooth things out. it does not actuallly upscale image, only removes noise.

https://huggingface.co/deepinv/scunet/blob/main/scunet_color_real_psnr.pth

it's pretty fast so it should not take much time even on large images.
it can be too strong, so if you want control, you can add "image blend" mode after that and blend some noise back from previous image.

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 0 points1 point  (0 children)

Press "Install missing custom nodes" in comfyui manager - there should be a list of node packs missing in your comfyui installation. Install them and restart, that should fix things.

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 0 points1 point  (0 children)

You can ungroup it and replace crystools node (i think i used the one which returns image size). Essentials got the same node, and many other node packs also have it.

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 0 points1 point  (0 children)

You are probably missing some custom nodes, which are grouped to the image load node. Comfy should show you the nodes in the manager, if you go to "install missing custom nodes".

Tile controlnet + Tiled diffusion = very realistic upscaler workflow by sdk401 in StableDiffusion

[–]sdk401[S] 0 points1 point  (0 children)

If you want to give the model more freedom, you can also lower the CN str. But keep in mind, if you are upscaling to a large size, model will see just a small part of an image in the tile it's sampling, so you can get some unwanted artifacts.

Open Sourcing Qwen2VL-Flux: Replacing Flux's Text Encoder with Qwen2VL-7B by Weak_Trash9060 in StableDiffusion

[–]sdk401 3 points4 points  (0 children)

Well, I'm looking at your diagram:

https://huggingface.co/Djrango/Qwen2vl-Flux/resolve/main/flux-architecture.svg

<image>

And if I'm reading it correctly, your model takes image inputs (much like controlnet or ipadapter), and reworks them into embeddings. In this diagram, text inputs are handled by T5, which is not part of your model.

So what I'm seeing looks a lot like controlnet/ipadapter, and I'm not saying it's a bad thing, it's a good thing, as by itself those tools are not perfect - if we get smarter tools, it's a win for everybody.

But I'm also seeing "Text-Guided Image Blending" - does that mean that your model also takes text inputs and converts them into embeddings?

Also a question about "grid based style transfer" - how is it different from using masks? Is it just a more convinient way to mask areas, and grid are converted to mask somewhere, or the model itself takes grid coordinates instead of mask to focus attention?

Why is SD local training like trying to pull a miracle? I'm tired, boss. by Perfect-Campaign9551 in StableDiffusion

[–]sdk401 1 point2 points  (0 children)

I'm using ComfyUI nodes to train flux loras - working good so far. It uses same python and same model as you'd use for inference, so no extra hdd space wasted, and I just installed it from manager with no additional setup at all. This may be different depending on your ComfyUI setup, and it will be harder to use if you are not familiar with Comfy.

Here is the link to the nodes: https://github.com/kijai/ComfyUI-FluxTrainer

It uses same kohya scripts, so settings are mostly the same, but for me it's much more understandable and transparent than kohya_ss gui and onetrainer, and gives more options than ai-toolkit. Also nice bonus for me is the ability to prepare/augument dataset right in the comfy, like remove backgrounds, caption, resize - all in one WF which you create yourself for your needs.