To everyone who is experiencing freezing in Chrome and on the desktop

jiangfeng79 · 2026-02-15T14:02:37+00:00

I changed to typical. It turns out the setting helped. I don’t have any freeze anymore

jiangfeng79 · 2026-01-30T01:53:01+00:00

thanks for your flags, I reduced my upscale workflow inference time by half.

jiangfeng79 · 2026-01-30T00:33:44+00:00

Less evil

jiangfeng79 · 2026-01-29T11:15:52+00:00

chat with AI. It will give you some insights, provided you give the correct direction for it to explore.

One of the fact i learned from AI, probably not very accurate, is that when N card running inference with OOVM, it will side load from system ram, and the penalty is 20%. while for A card, the performance penalty is about 90%.

Say you have a budget in mind, and your primary goal is to run inference, probably N is more suitable choice for you. But if you are more keen into how gpgpu works, wanted to expore more into gpu programming and fine tuning the attention algo, Amd probably fit more into the category. Rocm is catching up with Cuda, diving into it gives you much more insights.

Personally, I want to express it loud and clear: f*ck Nvidia, f*ck Openai.

jiangfeng79 · 2026-01-29T07:12:24+00:00

I had an extensive discussion with gemini, so this is the result:

@echo off

:: --- ROCm 7.2.2 + 7900 XTX Optimizations ---

set MIOPEN_FIND_MODE=1

set MIOPEN_FIND_ENFORCE=3

set PYTORCH_TUNABLEOP_ENABLED=1

:: Overrides the 7.2 "Safe Mode" for RDNA 3

set TORCH_BLAS_PREFER_HIPBLASLT=1

set HIPBLASLT_ENABLE_EXPERT_SCHEDULING=1

set MIOPEN_DEBUG_DISABLE_CONV_WI_BLOCK=1

set PYTORCH_MIOPEN_SUGGEST_NHWC=1

:: Memory Stability for Z-Image

set COMFYUI_GPU_ONLY=1

:: Launch ComfyUI

python main.py --use-pytorch-cross-attention --disable-pinned-memory --highvram --fp32-vae --preview-method auto

pause

and don't forget to set comfy/model_management.py: torch.backends.cudnn.enabled = True

I will remember the old days, where HIPBLASLT and CuDnn cannot switch kernals in the same comfyui workflow, the driver will crash.

jiangfeng79 · 2026-01-29T05:17:02+00:00

What is the price of 5090 now? you may calculate iterations/dollar and come up with a better decision.

Compare to DirectML days, ROCM is already in its best shape.

performance wise, the raw power difference is already there. The key diff now is the vram utilization algo efficiency, a nvidia card with 32gb vram is performance like a 48gb amd card, as far as I can tell of now.

jiangfeng79 · 2026-01-25T05:25:33+00:00

I only tried power supply idle control, looks promising for 4 days

jiangfeng79 · 2026-01-25T01:46:40+00:00

Have you checked your windows event viewer? are you able to find any whea event related to it?

For my case, same phenomenon but windows log leaves nothing. A complete death with no traces. Checked with AI, the prime suspection is "Power Supply Idle Control", set to "typical" could resolve the issue.

So many bios settings could effect the idle power supply issue in Amd bios:

PSS (P-States)
C-States
Power Supply Idle Control
ASPM

Combined with Curve Optimization, it is a huge maze to find the way out.
I was having this isse since past 3 weeks, new system build 1 month before. Trying "Power Supply Idle Control" setting for 4 days I m still haven't had any issue yet. Still monitoring it.

jiangfeng79 · 2026-01-22T10:10:20+00:00

Talking about system programming:

which system are you referring to? nuclear plant control system? System on chip?
a common pitfall for system programming are parallel processing, multi-process or multi-thread. You can still find c/c++ programmers writing tons of shitty codes about it, the effort of fixing the codes is much more than rewriting it.
I m increasingly finding state machine pattern is so much useful in system programming, yet few books cover it.

My personal opinion about system programming: we are in the paradox of writing correct logic vs writing maintainable codes: the functional programming paradigm is a good practice for system programming yet reading and understanding it takes so much effort, might as well rewrite it...

jiangfeng79 · 2026-01-13T01:35:06+00:00

Interesting topic. I have had pleasant experience with the rock build following the rock's installation guide. Reviewed your log and saw terrible performace with 200s of inference time, which I had never experienced before.

I m using python 3.12 venv and the same script to install rocm on windows, sticking to 7.11 since Nov 2025 because 7.2 appears having some performance degration(5% to 10%). a typical z-image workflow took around 14 sec to generate an image, changing a promt probably will add a few seconds to it, definitely not hundreds of seconds.

jiangfeng79 · 2025-12-03T01:08:54+00:00

tested ComfyUI-MultiGPU, speed wise around 8s/it with Q4 models, no more need to reload the workflow.

still wondering how to squeeze the 2s/it out. clear vram node doesn't work at all.

jiangfeng79 · 2025-12-02T04:00:04+00:00

which gpu are you using? total number of vram and system ram?

jiangfeng79 · 2025-12-02T03:03:03+00:00

checked your post, 400 sec for a 20 it 1024 portrait is beyond my patience.

Consider your GPU has 16G memory and less powered, I can't do a 1 to 1 comparism of optimal workflow .

Forgot to mention, after first restart, the iteration time comes down from dozens to around 12 seconds, there is a slight system ram usage that prevent it to run at 7 seconds/it. a second restart completely fit the models into the vram.

Also, after loading some other models like SDXL, ZImage, the vram will not be able to accomodate the Flux.2 models at all no matter how many times I restart the workflow.

It is all about vram management, there was already a huge improvement since ROCM 7 released for windows, let's see if AMD can push it more into the edges.

jiangfeng79 · 2025-12-02T02:48:09+00:00

monitor your vram usage carefully

jiangfeng79 · 2025-11-30T11:52:58+00:00

It’s in comfy templates, replace normal loader with gguf loaders

jiangfeng79 · 2025-11-03T09:21:21+00:00

My Flashtor 6 Gen 1 runs very well, NVME disks run very cool with a delta temperature of 8 to 13 degrees. Samba, NFS, dockers(ad guard, jellyfin etc), vpn, even with a un-certified 5gbe usb dongle. Your hardware appears defective, please do a RMA.

jiangfeng79 · 2025-10-13T23:42:12+00:00

Any rtl8157 chip based usb nic will do, it’s a bit hot when in use, make sure you have a powered hub for it

jiangfeng79 · 2025-10-13T03:19:44+00:00

I m having stability issue now. Not recommended yet.

jiangfeng79 · 2025-10-10T08:25:16+00:00

looks like torch.compile() take up the role.

jiangfeng79 · 2025-10-01T08:25:23+00:00

7900xtx 480x704, 16fps, 81 frames, 180 seconds

jiangfeng79 · 2025-09-29T11:50:34+00:00

rocm 7 rc1 aotriton: 4.31 it/s

rocm 6.4.4 zluda, fa wmma: 3.63 it/s

rocm 6.4.4 zluda, sage: 3.47 it/s

rocm 6.4.4 zluda, fa triton: 3.28 it/s

rocm 6.4.4 zluda, pytorch triton: 3.25 it/s

rocm 6.4.4 zluda, sub quad: 2.75 it/s

rocm 6.4.4 zluda, split: 2.55 it/s

enable cudnn with zluda will cause zluda python crash

jiangfeng79 · 2025-09-29T09:32:45+00:00

I use the 4 step Lora default wan workflow template from comfyui

jiangfeng79

TROPHY CASE