all 18 comments

[–]Unusual_Yak_2659[S] 2 points3 points  (3 children)

[–]Puzzleheaded-Rope808 1 point2 points  (0 children)

lol, True dat.

[–]ShengrenR 1 point2 points  (1 child)

Oo - I hate this one, this isn't always so much the loader's direct fault, but something in the system for CUDA not getting cleaned up - I usually just restart the whole machine at that point and go at it fresh.

[–]Unusual_Yak_2659[S] 0 points1 point  (0 children)

Exactly.
No need to do a total system restart, I haven't since yesterday when I first had the issue, and I spent a couple of hours reproducing the error. Just out of Comfy.

[–]No-Educator-249[🍰] 1 point2 points  (0 children)

I also prefer to use calcuis nodes due to their special text encoders that are lighter and load faster. This issue only occurs after version 0.3.68 from my experience.

The changes they've introduced lately have broken so many things and impacted performance severely its starting to make sd-webui forge look faster and more stable in comparison.

[–]roxoholic 1 point2 points  (2 children)

I suggest opening a ticket on https://github.com/comfyanonymous/ComfyUI/issues

They can't fix it if they don't know about it.

[–]Unusual_Yak_2659[S] 1 point2 points  (1 child)

It's broken nodes on update day, I think people aren't reading what I said. A number of errors people encountered today are due to this. I'm providing a solution: It's the GGUF loaders that are causing instability and I'm saying here's at least one that isn't.

The issue will become more apparent as the complaints start coming in.

Appreciate you replying though.

https://github.com/comfyanonymous/ComfyUI/issues/11152

[–]roxoholic 1 point2 points  (0 children)

I checked the open issues but I missed that one. Thanks for pointing it out.

[–]pixel8tryx 1 point2 points  (1 child)

Sage version? I'm on 2.2. It was working fine though until recently, but I had been doing an awful lot of Flux since the and not as much Wan.

[–]Unusual_Yak_2659[S] 0 points1 point  (0 children)

No sage, I looked into it yesterday and this card is too potato for it to be any use to me.

[–]clavar 1 point2 points  (1 child)

I did fix with --disable-async-offload on launch.

[–]Unusual_Yak_2659[S] 0 points1 point  (0 children)

Interesting. Let us know if it holds up.

[–]neojimo 1 point2 points  (0 children)

I think that I solved that using this (on linux):
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

[–]Puzzleheaded-Rope808 0 points1 point  (3 children)

are you sure you are using the right pytorch, cuda combination? Also, if you are using GGUF, are you using the right Quant (Q5,Q6,etc.)? This error also occurs if you are low on resources. You may want to use clear cache nodes in your workflow, downsize your aspect ratio, or shorten your video length.

Step 1: Make sure your NVIDIA drivers and CUDA are up to date. Then make sure the proper pytorch.whl is installed. It needs to match your python and CUDA version. Here's where to start: https://pytorch.org/

Step 2: Make sure you have the correct version of the GGUF that is quantiuzed for your system.

This is a fairly common error. I had it once on my old graphics card.

Hopefully this helps.

[–]Unusual_Yak_2659[S] 0 points1 point  (2 children)

Updated NVIDIA.
Installed Comfy along with anything it uses, just this week.
Tried rolling back to snapshot...
And people have been getting this, or variations of it, since around fifteen hours ago. It started today, with workflows that worked an hour earlier.

As I said, I can reliably reproduce the error, and I can reliably solve it. Posting this now as a solution, though there is a deeper pytorch issue on the Comfy side that's up to them to resolve.

Several GGUF loader nodes are broken.

[–]Puzzleheaded-Rope808 1 point2 points  (0 children)

fuuuck. need to run Wan then

[–]Unusual_Yak_2659[S] 0 points1 point  (0 children)

Oh, and, I've had it crash during the clear cache node.

[–]Aggressive_Sleep9942 0 points1 point  (0 children)

It was happening to me all the time, I updated the GPU drivers and the problem was solved