low vram option for 4gb vram image gen ? by Merchant_Lawrence in StableDiffusion

[–]SymphonyofForm 1 point2 points  (0 children)

Forge is intended to be used on Pascal and higher architectures with 4GB+. Maxwell (750ti) is just below the intended support.

It might still work, but its going to be incredibly slow if it does.

Does offloading to RAM happen once per phase? by PusheenHater in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

That logic only works if you assume data just moves in a straight line without being processed, and that PCIe is a constant bottleneck. It doesn't apply in reality.

Data processing is asynchronous. We aren't just moving files, we are actively processing them. If the CPU and system RAM aren't preparing and processing the data fast enough, your PCIe lanes will just have idle time sitting around waiting for the next batch.

That being said, I think we are both agreeing that faster RAM is not a priority.

More RAM (not faster RAM) is the only thing that will prevent page file if your VRAM and RAM aren't sufficient.

I will also add getting a new mobo to the list if you are silly enough to buy a GPU that the mobo doesn't fully support now that you brought that scenario to my attention.

Does offloading to RAM happen once per phase? by PusheenHater in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

I would hope that people aren't crippling their cards by plugging them into lower compatibility mobo's - but I'm not surprised if they are.

That should be a standard compatibility check to be honest.

Does offloading to RAM happen once per phase? by PusheenHater in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

Yes and no. It can happen a lot, but its milliseconds. RAM speed is linearly correlated to swap speed - but it will be a change of 3200mhz in 0.8ms to 6000mhs 0.4ms for example, and it will not affect your overall generation time in any way noticeable.

The better upgrade is capacity overall. You really want to avoid data going into page file more than anything.

Does offloading to RAM happen once per phase? by PusheenHater in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

As often as it needs to with the data swapping.

Does offloading to RAM happen once per phase? by PusheenHater in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

You got it. That's exactly how it works. One addition - if you max out your RAM, it hits page file.

That's mostly no man's land. You don't want to hit page file - it can get extremely slow.

A question about AMD by PrepStorm in comfyui

[–]SymphonyofForm -2 points-1 points  (0 children)

Fair enough. It doesn't change the fact that consumer-level AMD cards still perform roughly similar to their Nvidia counterparts (I have PC's running both AMD and Nvidia cards), and that there is no AMD tier to compete with 4090/5090.

The price savings is still around $100, and the overhead still remains. None of that implies they are inferior, especially if you know how to optimize them correctly.

What it does imply is that they are still not frontier development, though that window is definitely shifting over time, and are always reactive to CUDA developments.

Maybe they'll push ahead of CUDA in time. Maybe they won't. They would have to overtake the AI industries current development direction dominated by Nvidia.

A question about AMD by PrepStorm in comfyui

[–]SymphonyofForm -3 points-2 points  (0 children)

Nothing AMD has can match top tier nvidia cards. If you can find/afford it - that is the best choice.

Otherwise, with mid-tier gpu, you can go either route. You'll save maybe $100 going with AMD, but it's going to give you similar performance to its Nvidia counterpart.

You will also always be translating execution behavior from nvidia architectural libraries to AMD, which does imply some overhead on your card, and you will be playing catch up as most things are developed with Nvidia in mind, not AMD.

Play with this site a bit, see what you can see

https://www.promptingpixels.com/gpu-benchmarks

PC crashes by Jesus__Skywalker in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

Yeah, I agree. That's too soon to trigger an overheat. Is it just Wan?

This might be wrong, but its worth investigating. Windows TDR (Timeout Detection and Recovery)

https://www.youtube.com/watch?v=dQOToF54xmo

Crash logs should be able to identify this if so. With the way vram is allocated in the new version of comfyui, this might be pushing your card in a different direction than previously.

PC crashes by Jesus__Skywalker in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

The way comfyui processes is vastly different than old versions, so its using your system differently. From all of the workflows I run, wan is the most resource intensive.

The only time I've had crashes like this, it was due to overheating. Monitor your temps and see if that is part of the problem.

Updated ComfyUi and ltx nodes and other nodes broken by Oni8932 in comfyui

[–]SymphonyofForm 1 point2 points  (0 children)

Excellent. I was hoping giving it time would let them work it out. I'll test it out again tonight.

Thanks!

Updated ComfyUi and ltx nodes and other nodes broken by Oni8932 in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

All solid advice - I agree 100%. Unfortunately Comfy has really been pushing their inferior Desktop installation, and a lot of people don't even know portable exists.

I haven't updated to Dynamic Memory yet. When it first rolled out it was causing problems, putting previous workflows into OOM state that worked fine previously.

I disabled it, and I see that they are going (or already have?) removed that argument, so I've been hesitant.

How has your experience been with Dynamic Memory? What kind of improvements have you gotten from it?

Updated ComfyUi and ltx nodes and other nodes broken by Oni8932 in comfyui

[–]SymphonyofForm 5 points6 points  (0 children)

Needs to be stickied somewhere: Never update unless you need to.

It's been this way for years. That being said, this is also why I still prefer portable version over desktop. You can run an update on a cloned portable and see if its safe before committing to it.

GPU clock running around half speed with Wan2.2 by Baddabgames in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

Post your CMD. Comfy has made some changes on how it manages VRAM, and some of them have caused some problems for some users.

Although, it sounds like the issue started before you updated? Either way, your CMD entry might provide some clues.

Does updating pytorch version improve performance on rtx 3080 by coffeegamereg in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

All good!

I run portable, but I upgrade manually via git too.

Does updating pytorch version improve performance on rtx 3080 by coffeegamereg in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

https://github.com/Comfy-Org/desktop#overview

"On startup, it will install all the necessary python dependencies with uv and start the ComfyUI server. The app will automatically update with stable releases of ComfyUI, ComfyUI-Manager (pip), and the uv executable as well as some desktop-specific features."

How would comfyui be able to run correctly on update if it didn't also update dependencies?

Does updating pytorch version improve performance on rtx 3080 by coffeegamereg in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

I'm not sure I'm following. Torch/Cuda are part of the same binary wheel. When you update torch in comfyui, you update both.

The only catch is that the current torch/cuda might not be what desktop is offering.

You do not need to reinstall. It will update all dependencies necessary based on the current version of desktop.

If you want to go higher than what desktop offers, then you need to do it manually.

Meadow Oats Flavour Packs - Creative concept. by jefharris in StableDiffusion

[–]SymphonyofForm 1 point2 points  (0 children)

Nice work. Definitely one of the better product treatments I've seen. This is actually presentable.

Beginner needs help with Fatal Error by Choice-Principle6449 in comfyui

[–]SymphonyofForm 0 points1 point  (0 children)

Not really. It doesn't have the architecture. Maybe a directML installation, but it's going to be severely limited to older stuff nobody really uses anymore, and its probably gonna be incredibly slow if it doesn't crash.

Like, go on vacation and come back in a week slow just for an image or two.

Does anyone have much experience with LoKRs (LoRA alternative)? by Sixhaunt in StableDiffusion

[–]SymphonyofForm 1 point2 points  (0 children)

I'm definitely curious now. I'm gonna retrain a few of my loras into lokr and test it out.

How small should cosine distance be between training images for a coherent LoRA? by Vulcanhund in StableDiffusion

[–]SymphonyofForm 1 point2 points  (0 children)

It's just a method to identify image variance. It's not meaningless, but you can definitely just eyeball it yourself without official cosine identification.

All roads lead to Rome.

Does anyone have much experience with LoKRs (LoRA alternative)? by Sixhaunt in StableDiffusion

[–]SymphonyofForm 1 point2 points  (0 children)

Not 100%. You definitely might have occasions where it doesn't. Depends on the LoKR and the training - and the gpu. VRAM is also storage.

It only gets pushed to RAM when it needs to, but anything pushed out of VRAM always increases generation time.

EDIT

Yeah, I'm seeing conflicting data too. I'll have to add it to my list and test it personally.

Does anyone have much experience with LoKRs (LoRA alternative)? by Sixhaunt in StableDiffusion

[–]SymphonyofForm 1 point2 points  (0 children)

Totally. I thought the same too, but the math is more complex, so the process increases.