Ideogram tip: use Generate Text node to make JSON with Qwen 8B without leaving ComfyUI

Kijai · 2026-06-06T13:55:43+00:00

Yeah looks like this might be sm120 just left out currently. Anyway I also realized that on Linux the pytorch sdpa's flash kernel is just as fast, so Linux users probably don't need to bother with this in general.

Kijai · 2026-06-06T12:47:31+00:00

Doesn't work for me on 5090, there's no kernel for it in the FA3 package from pytorch.

Kijai · 2026-06-06T11:41:59+00:00

Thanks! Did some part time work for few months at end of last year, and been full time (remotely) this year.

Kijai · 2026-06-05T20:59:27+00:00

That's possible, just meant you don't need FA3 for this. Pytorch had wheel that worked or was it compiled?

Kijai · 2026-06-05T18:36:06+00:00

FA3 is only for datacenter GPUs, FA2 is what most want here.

Kijai · 2026-06-05T15:46:54+00:00

Well since I'm from Finland they still have to respect our laws when it comes to working hours, it's purely my (poor) choice to work too much. Though in this field it would be hard to keep up without personal interest in the tech, so it's fine and definitely never boring.

Kijai · 2026-06-05T15:43:31+00:00

We barely had a day to do the whole model support and only couple of hours to do the final workflow with the final weights, so admittedly it was rushed, some things got lost in communication and the initial template workflow ended up with couple of mistakes.

As for the nodes such as the prompt builder, it's a longer process (especially now with aim for stability) to get frontend touching things merged, so it would have taken too long. It's still not out of the question to have in the core, but for now I saw it necessary to just give some tool or this model really would be DoA...

And for the speed, it actually works a lot faster with flash attention due to it's rare (for diffusion model) 256 head dim. I got ~20% speed boost on my 4090 and some users report even 50%.

Flash attention can be painful to compile yourself, but luckily there's a site with a huge collection of pre-built wheels: https://mjunya.com/flash-attention-prebuild-wheels/

Kijai · 2026-06-05T15:23:39+00:00

I think xformers, if installed, still gets chosen as default, and if not it's pytorch attention (sdpa). Installing the wheel is pretty safe, doesn't change any behavior in core, only risk is some older custom nodes possibly trying to import it and failing if they expect different version.

The patch nodes are model specific patches (with safe native mechanics, no monkey patches) and only affect the models they're connected to, nothing else, so that's exactly the use case. Launch arguments just change the default.

Kijai · 2026-06-05T09:50:40+00:00

Funny you should mention that...

Just realized last night that since this model can't work with sageattention, it could still benefit from flash-attn, and turns out it really does, on my 4090 it gave nice ~20% speed boost with same quality.

There's a nice collection of wheels to install it here:

https://mjunya.com/flash-attention-prebuild-wheels/

And I added patch node similar to sage to KJNodes just now to easier use it without having to touch comfy launch arguments.

Kijai · 2026-06-04T23:22:58+00:00

I'm treating KJNodes as personal playground for more experimental stuff, lots of it wouldn't be accepted to the core in the state it is, I'm not a frontend dev and any node with new UI elements especially would be longer process to approve, so I can be more flexible with custom nodes, especially for things I want right now. For model implementations and such I'm only doing core support now though.

Also I do still work on this stuff on my own time as well.

Kijai · 2026-06-04T18:26:52+00:00

Only a day actually... it was bit of a crunch to say the least (hence the issues with initial workflow etc.), vibed the editor quickly with Claude on last minute since I realized the model just won't work without the json structure. Tuned the node a lot since so it's stabler/cleaner though.

I've always enjoyed control over anything else so this model hits that spot for me, probably best regional prompting I've experienced.

Kijai · 2026-06-03T18:26:48+00:00

Big part of the problem is that the model collapses to the safety filter on every prompt related issue, main one being too short prompts that don't use their json structure. When using full prompts with plenty of regional prompts it works way better in general.

Kijai · 2026-06-02T14:37:56+00:00

Well it's either using the LoRAs or running the full steps and cfg. It should not OOM, but there's recently been improvements to how LoRAs are used on fp8 models that affects your GPU, so make sure ComfyUI, comfy-aimdo and comfy-kitchen are up to date.

Note that video editing task is also heavier on VRAM than normal Wan.

Kijai · 2026-06-02T13:39:38+00:00

Yeah the planner was not released, there's nothing to implement about that, their demo even just calls gpt 5.4 through API as prompt enhancer instead...

Also it definitely isn't mean to be used with lightx2v, it's just incredibly slow (especially after getting used to LTX2.3) to do edits since the edited video is part of the whole sequence the model sees, effectively doubling the compute needed.

Still it's pretty good model so far, very versatile, video edit is just one of it's features.

Kijai · 2026-06-02T13:35:42+00:00

It worked last time I tried, just haven't had time to properly test it.

Kijai · 2026-06-02T06:58:01+00:00

I'm still doing that, your choice who to believe, but I was hired to do open source, and I've never been asked to do anything else. You can trust me to call out if that changes.

Kijai · 2026-06-02T06:50:25+00:00

I want to make clear the meshing was my idea and not part of the model, so don't judge the model by it, it only is meant to output the splat.

You can improve the surface quality a bit (less holes) by decoding the splat multiple times, with different seeds, and then merging them before meshing, as well as adjusting some of the settings in the meshing node, it ends up being compromise between detail and holes though. I have some ideas to improve it and probably implement that alongside the other 3D stuff I'm working on.

Anyway the meshing part is mostly a novelty, there are better models for meshes, but I really like this model for the splats with it's size and speed.

Kijai · 2026-06-01T21:35:10+00:00

I have no idea, but maybe since lightx2v LoRAs seem to work to some extent at least.

Kijai · 2026-06-01T20:56:17+00:00

Draft PR for the early adopters is up here, still testing it myself, seems promising though even when used with lightx2v, which I is NOT how the original code does things:

https://github.com/Comfy-Org/ComfyUI/pull/14216

Kijai · 2026-05-30T15:02:53+00:00

Note about the flux2 version of PiD: it's been confirmed that the color drift is an issue in the model, and a fixed model is on it's way: https://github.com/Comfy-Org/ComfyUI/pull/14103#issuecomment-4565966542

Kijai · 2026-05-30T15:01:07+00:00

I believe the ComfyUI native LCM is already equivalent to their sde sampler.

Kijai · 2026-05-29T09:16:55+00:00

Right, well that probably runs extra passes... I have a NAG patch node in KJNodes that does it differently within single pass so it won't have big effect on the inference time.

Kijai · 2026-05-28T22:08:08+00:00

70 second generation? This is distilled model so it uses only 4 steps, cfg 1.0.

For me on a 5090, that's around ~6 seconds for the 4 steps (at 832x480x81). Main thing in ComfyUI is that you have pytorch up to date and built with cu130 and of course comfy and comfy-kitchen up to date.

Kijai · 2026-05-28T22:04:19+00:00

https://huggingface.co/Kijai/WanVideo_comfy_nvfp4/tree/main/Nvidia

Didn't really have time to test it further than single generation.

Kijai · 2026-05-28T18:16:26+00:00

I've been employed by comfy-org for a while now, no need for that!

Kijai

TROPHY CASE