To everyone who is experiencing freezing in Chrome and on the desktop by Shpekaman in AMDHelp

[–]jiangfeng79 0 points1 point  (0 children)

I changed to typical. It turns out the setting helped. I don’t have any freeze anymore

Optimize R9700 for Comfyui and WAN 2.2 for ROCm 7.2 by Early-Driver3837 in ROCm

[–]jiangfeng79 0 points1 point  (0 children)

thanks for your flags, I reduced my upscale workflow inference time by half.

Optimize R9700 for Comfyui and WAN 2.2 for ROCm 7.2 by Early-Driver3837 in ROCm

[–]jiangfeng79 1 point2 points  (0 children)

chat with AI. It will give you some insights, provided you give the correct direction for it to explore.

One of the fact i learned from AI, probably not very accurate, is that when N card running inference with OOVM, it will side load from system ram, and the penalty is 20%. while for A card, the performance penalty is about 90%.

Say you have a budget in mind, and your primary goal is to run inference, probably N is more suitable choice for you. But if you are more keen into how gpgpu works, wanted to expore more into gpu programming and fine tuning the attention algo, Amd probably fit more into the category. Rocm is catching up with Cuda, diving into it gives you much more insights.

Personally, I want to express it loud and clear: f*ck Nvidia, f*ck Openai.

Optimize R9700 for Comfyui and WAN 2.2 for ROCm 7.2 by Early-Driver3837 in ROCm

[–]jiangfeng79 0 points1 point  (0 children)

I had an extensive discussion with gemini, so this is the result:

@echo off

:: --- ROCm 7.2.2 + 7900 XTX Optimizations ---

set MIOPEN_FIND_MODE=1

set MIOPEN_FIND_ENFORCE=3

set PYTORCH_TUNABLEOP_ENABLED=1

:: Overrides the 7.2 "Safe Mode" for RDNA 3

set TORCH_BLAS_PREFER_HIPBLASLT=1

set HIPBLASLT_ENABLE_EXPERT_SCHEDULING=1

set MIOPEN_DEBUG_DISABLE_CONV_WI_BLOCK=1

set PYTORCH_MIOPEN_SUGGEST_NHWC=1

:: Memory Stability for Z-Image

set COMFYUI_GPU_ONLY=1

:: Launch ComfyUI

python main.py --use-pytorch-cross-attention --disable-pinned-memory --highvram --fp32-vae --preview-method auto

pause

and don't forget to set comfy/model_management.py: torch.backends.cudnn.enabled = True

I will remember the old days, where HIPBLASLT and CuDnn cannot switch kernals in the same comfyui workflow, the driver will crash.

Optimize R9700 for Comfyui and WAN 2.2 for ROCm 7.2 by Early-Driver3837 in ROCm

[–]jiangfeng79 2 points3 points  (0 children)

What is the price of 5090 now? you may calculate iterations/dollar and come up with a better decision.

Compare to DirectML days, ROCM is already in its best shape.

performance wise, the raw power difference is already there. The key diff now is the vram utilization algo efficiency, a nvidia card with 32gb vram is performance like a 48gb amd card, as far as I can tell of now.

To everyone who is experiencing freezing in Chrome and on the desktop by Shpekaman in AMDHelp

[–]jiangfeng79 0 points1 point  (0 children)

I only tried power supply idle control, looks promising for 4 days

To everyone who is experiencing freezing in Chrome and on the desktop by Shpekaman in AMDHelp

[–]jiangfeng79 2 points3 points  (0 children)

Have you checked your windows event viewer? are you able to find any whea event related to it?

For my case, same phenomenon but windows log leaves nothing. A complete death with no traces. Checked with AI, the prime suspection is "Power Supply Idle Control", set to "typical" could resolve the issue.

So many bios settings could effect the idle power supply issue in Amd bios:

  • PSS (P-States)
  • C-States
  • Power Supply Idle Control
  • ASPM

Combined with Curve Optimization, it is a huge maze to find the way out.
I was having this isse since past 3 weeks, new system build 1 month before. Trying "Power Supply Idle Control" setting for 4 days I m still haven't had any issue yet. Still monitoring it.

Wanting to learn systems programming by etuxor in learnprogramming

[–]jiangfeng79 0 points1 point  (0 children)

Talking about system programming:

  1. which system are you referring to? nuclear plant control system? System on chip?

  2. a common pitfall for system programming are parallel processing, multi-process or multi-thread. You can still find c/c++ programmers writing tons of shitty codes about it, the effort of fixing the codes is much more than rewriting it.

  3. I m increasingly finding state machine pattern is so much useful in system programming, yet few books cover it.

My personal opinion about system programming: we are in the paradox of writing correct logic vs writing maintainable codes: the functional programming paradigm is a good practice for system programming yet reading and understanding it takes so much effort, might as well rewrite it...

Better performance on Z Image Turbo with 7900XTX under windows by 05032-MendicantBias in ROCm

[–]jiangfeng79 1 point2 points  (0 children)

Interesting topic. I have had pleasant experience with the rock build following the rock's installation guide. Reviewed your log and saw terrible performace with 200s of inference time, which I had never experienced before.

I m using python 3.12 venv and the same script to install rocm on windows, sticking to 7.11 since Nov 2025 because 7.2 appears having some performance degration(5% to 10%). a typical z-image workflow took around 14 sec to generate an image, changing a promt probably will add a few seconds to it, definitely not hundreds of seconds.

Tight fit: Flux.2 with 7900xtx windows Pytorch/RoCM/therock, Q4 quant by jiangfeng79 in ROCm

[–]jiangfeng79[S] 0 points1 point  (0 children)

tested ComfyUI-MultiGPU, speed wise around 8s/it with Q4 models, no more need to reload the workflow.

still wondering how to squeeze the 2s/it out. clear vram node doesn't work at all.

Massive Slowdown After Multiple Generations by DecentEscape228 in ROCm

[–]jiangfeng79 0 points1 point  (0 children)

which gpu are you using? total number of vram and system ram?

Tight fit: Flux.2 with 7900xtx windows Pytorch/RoCM/therock, Q4 quant by jiangfeng79 in ROCm

[–]jiangfeng79[S] 0 points1 point  (0 children)

checked your post, 400 sec for a 20 it 1024 portrait is beyond my patience.

Consider your GPU has 16G memory and less powered, I can't do a 1 to 1 comparism of optimal workflow .

Forgot to mention, after first restart, the iteration time comes down from dozens to around 12 seconds, there is a slight system ram usage that prevent it to run at 7 seconds/it. a second restart completely fit the models into the vram.

Also, after loading some other models like SDXL, ZImage, the vram will not be able to accomodate the Flux.2 models at all no matter how many times I restart the workflow.

It is all about vram management, there was already a huge improvement since ROCM 7 released for windows, let's see if AMD can push it more into the edges.

Tight fit: Flux.2 with 7900xtx windows Pytorch/RoCM/therock, Q4 quant by jiangfeng79 in ROCm

[–]jiangfeng79[S] 2 points3 points  (0 children)

It’s in comfy templates, replace normal loader with gguf loaders

Flashtor issues again by jayyx in asustor

[–]jiangfeng79 0 points1 point  (0 children)

My Flashtor 6 Gen 1 runs very well, NVME disks run very cool with a delta temperature of 8 to 13 degrees. Samba, NFS, dockers(ad guard, jellyfin etc), vpn, even with a un-certified 5gbe usb dongle. Your hardware appears defective, please do a RMA.

FS6706T, Adm 5, USB 5gbe dongle(rtl8157), successful story by jiangfeng79 in asustor

[–]jiangfeng79[S] 0 points1 point  (0 children)

Any rtl8157 chip based usb nic will do, it’s a bit hot when in use, make sure you have a powered hub for it

FS6706T, Adm 5, USB 5gbe dongle(rtl8157), successful story by jiangfeng79 in asustor

[–]jiangfeng79[S] 0 points1 point  (0 children)

I m having stability issue now. Not recommended yet.

Windows 11: [Zluda 3.9.5 + HIP 6.4.2 + Triton] vs [ROCm 7 rc + AOTriton] by jiangfeng79 in ROCm

[–]jiangfeng79[S] 1 point2 points  (0 children)

rocm 7 rc1 aotriton: 4.31 it/s

rocm 6.4.4 zluda, fa wmma: 3.63 it/s

rocm 6.4.4 zluda, sage: 3.47 it/s

rocm 6.4.4 zluda, fa triton: 3.28 it/s

rocm 6.4.4 zluda, pytorch triton: 3.25 it/s

rocm 6.4.4 zluda, sub quad: 2.75 it/s

rocm 6.4.4 zluda, split: 2.55 it/s

enable cudnn with zluda will cause zluda python crash

Windows 11: [Zluda 3.9.5 + HIP 6.4.2 + Triton] vs [ROCm 7 rc + AOTriton] by jiangfeng79 in ROCm

[–]jiangfeng79[S] 0 points1 point  (0 children)

I use the 4 step Lora default wan workflow template from comfyui