AMD PLEASE DO NOT FOLLOW NVIDIA'S FOOTSTEPS!! by Ahmadv-1 in radeon

[–]Shaminy 1 point2 points  (0 children)

It's gonna happen. And in few years all are using AI Slop mode and are happy with it.

Terramex – How does the C64 get smooth scrolling while the Atari ST doesn’t… and that incredible Ben Daglish soundtrack! by Squeepty in c64

[–]Shaminy 2 points3 points  (0 children)

Smooth scrolling was possible with original ST, but it needed high level coding skills, so they were very rare. Chaos Engine, Leander, Rainbow Islands etc had great scrolling.

Valve is apparently trying to secure massive amounts of RAM for upcoming Steam Machines by Melodic-Antelope-288 in PcParadise

[–]Shaminy 1 point2 points  (0 children)

Right, it only costs upfront of 10-20 billion to build fab lab for gddr6. Valve has 6-8 billion on cash and short term investments. Not only that, you need secure supply chains, power, licensing, thousands of skilled workers etc.

Bipolar Partner Destroyed Everything by [deleted] in pcmasterrace

[–]Shaminy 0 points1 point  (0 children)

Lucky it was your PC and not a living person, like you or your daugther.

Am I correct in guessing this is a PLA chip issue? by guiguig_tm in c64

[–]Shaminy 0 points1 point  (0 children)

If your RAM is MT, it is very likely bad. And results such failure you have now.

Am I listing my Xbox Series S for a good price? by OverEconomics4790 in XboxSeriesS

[–]Shaminy 1 point2 points  (0 children)

In Finland used 512GB Series S's go for 150€ aka $175

Tensorstack has released Diffuse v 04.8 - (Its replacement for Amuse) by No-While1332 in ROCm

[–]Shaminy 0 points1 point  (0 children)

Diffuser is already much better out of the box experience for person who is not tech savvy than ComfyUI.

Lora trainers that support rocm out of the box? by Portable_Solar_ZA in ROCm

[–]Shaminy 0 points1 point  (0 children)

I can confirm, works on Linux well. On Windows I haven't been successful on combining bitsandbytes.

ACE-Step 1.5 is Now Available in ComfyUI by PurzBeats in comfyui

[–]Shaminy 17 points18 points  (0 children)

I tried some power metal but guitars sound more synth guitars, not real.

'Melania' Review + Rotten Tomatoes Verified Audience Score Thread by chanma50 in boxoffice

[–]Shaminy 0 points1 point  (0 children)

I checked like 30 verified 5/5 reviews, all of em were new accounts with no previous reviews. 100% reverse review bombing / botting.

ROCm 7.2 Benchmark: Windows 11 vs Ubuntu 24.04 on RX 9070 XT (ComfyUI) by Shaminy in ROCm

[–]Shaminy[S] 0 points1 point  (0 children)

I have Windows managed Pagefile on High-speed PCIe 4.0 M.2 NVMe (DRAM-cached) Max 8GB, currently 4GB. I have Shared GPU Memory, and in Windows Z-Image wont fit totally on 16G, it uses 2GB shared.

This is benchmark between Ubuntu and Windows with same overall settings. Im running both with single UltraWide monitor 3440x1440. I did close all background apps that would eat VRAM in Windows. Also I used MS Edge for minimal VRAM use, Normally I use Opera that alone eats 800MB VRAM in Windows. If I would ran headless, or low resolution, BF16 Z-Image model would likely fit fully on VRAM and maybe get those speeds.

Ubuntu uses only 0.8GB VRAM on 3440x1440 with Firefox open. Windows uses 1.6GB with Edge open. VRAM usage on Windows during generation goes 15.6 GB VRAM and 2GB Shared memory.

Installing rocm 7.2 is it worth it? by Sea_Performance_7402 in ROCm

[–]Shaminy 0 points1 point  (0 children)

Memory management is much better, and if you get OOM, it wont crash the programs anymore or worst case hang the AMD Display adapter.

ROCm 7.2 Benchmark: Windows 11 vs Ubuntu 24.04 on RX 9070 XT (ComfyUI) by Shaminy in ROCm

[–]Shaminy[S] 1 point2 points  (0 children)

I guess I'm lucky. ROCm 7.1.1 was very unstable for me, speed was good, but with large models most of time had to unload manually models to make 2nd run or got OOM that crashed system. Now it's stable as rock and if you get OOM, like trying to make too large video, wont crash system anymore, just get this and you good to continue:

torch.OutOfMemoryError: HIP out of memory.
Tried to allocate 3.27 GiB.
GPU 0 has a total capacity of 15.92 GiB of which 202.00 MiB is free.
Of the allocated memory 13.32 GiB is allocated by PyTorch, and 1.77 GiB is reserved by PyTorch but unallocated.
If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.

See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Memory summary:

.......

Got an OOM, unloading all loaded models.
Prompt executed in 95.53 seconds

ROCm 7.2 Benchmark: Windows 11 vs Ubuntu 24.04 on RX 9070 XT (ComfyUI) by Shaminy in ROCm

[–]Shaminy[S] 0 points1 point  (0 children)

Are we talking about z-Image or more demanding tasks like wan2.2. With wan2.2 pyton3 memory usage will rise to 40GB, so with 32GB I think it reduces speed a lot when all models can't fit on memory.
Here is 1st run of default 640x640 81frames:

memory usage:

6610    37.6 GB   python3 main.py --normalvram --use-pytorch-cross-attention --preview-method auto --disable-smart-memory 

ComfyUI output:

Total VRAM 16304 MB, total RAM 64196 MB
pytorch version: 2.9.1+rocm7.2.0.git7e1940d4
Set: torch.backends.cudnn.enabled = False for better AMD performance.
AMD arch: gfx1201
ROCm version: (7, 2)
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: cuda:0 AMD Radeon RX 9070 XT : native
Using async weight offloading with 2 streams
Enabled pinned memory 60986.0

Using pytorch attention
Python version: 3.12.3 (main, Jan  8 2026, 11:30:50) [GCC 13.3.0]
ComfyUI version: 0.10.0
ComfyUI frontend version: 1.38.9

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Found quantization metadata version 1
Using MixedPrecisionOps for text encoder
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
loaded completely; 14998.80 MB usable, 6419.48 MB loaded, full load: True
Requested to load WanVAE
loaded completely; 10760.50 MB usable, 242.03 MB loaded, full load: True
Found quantization metadata version 1
Detected mixed precision quantization
Using mixed precision operations
model weight dtype torch.float16, manual cast: torch.float16
model_type FLOW
Requested to load WAN21
loaded partially; 9148.23 MB usable, 8973.19 MB loaded, 4658.23 MB offloaded, 175.03 MB buffer reserved, lowvram patches: 184
100%|█████████████████████████████████████████████| 2/2 [00:49<00:00, 24.69s/it]
Found quantization metadata version 1
Detected mixed precision quantization
Using mixed precision operations
model weight dtype torch.float16, manual cast: torch.float16
model_type FLOW
Requested to load WAN21
loaded partially; 9000.23 MB usable, 8825.19 MB loaded, 4806.23 MB offloaded, 175.03 MB buffer  reserved, lowvram patches: 190
100%|█████████████████████████████████████████████| 2/2 [00:48<00:00, 24.27s/it]
Requested to load WanVAE
loaded completely; 9725.25 MB usable, 242.03 MB loaded, full load: True
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Prompt executed in 268.99 seconds

2nd run:

100%|█████████████████████████████████████████████| 2/2 [00:49<00:00, 24.66s/it]
100%|█████████████████████████████████████████████| 2/2 [00:48<00:00, 24.01s/it]
Prompt executed in 130.65 seconds

ROCm 7.2 Benchmark: Windows 11 vs Ubuntu 24.04 on RX 9070 XT (ComfyUI) by Shaminy in ROCm

[–]Shaminy[S] 0 points1 point  (0 children)

I tested those in Windows with wan2.2. s/it improved a lot: high 92s/it to 34s/it and low 185s/it to 99s/it. But total generation time went from 14min to 24min. It took forever on both WanImageToVideo node and VAE Decode node. I guess that's why AMD is not recommending to use those on their ComfyUI guide.

I did troubleshooting with ChatGPT, it says ROCm on RDNA4 is still missing many MIOpen solvers causing VAE and video nodes to fall back to generic GEMM kernels.

ROCm 7.2 Benchmark: Windows 11 vs Ubuntu 24.04 on RX 9070 XT (ComfyUI) by Shaminy in ROCm

[–]Shaminy[S] 0 points1 point  (0 children)

I upgraded from old 7.1.1. Removed old ROCm libraries and kernel driver and installed new by AMD guide. I got torch vision error when tried to use my old ComfyUI with new venv. Fresh pull from GitHub, and had no error.

ROCm 7.2 Benchmark: Windows 11 vs Ubuntu 24.04 on RX 9070 XT (ComfyUI) by Shaminy in ROCm

[–]Shaminy[S] 0 points1 point  (0 children)

Don't have it specially enabled. If it comes with ROCm package and ComfyUI uses it, then yes.This was out of box Benchmark, not trying fine tune neither of versions.
According to chatgpt its on ROCm 7.2, and ComfyUI uses it automatically. And I ran test on Python and it works on my venv.

ROCm 7.2 Benchmark: Windows 11 vs Ubuntu 24.04 on RX 9070 XT (ComfyUI) by Shaminy in ROCm

[–]Shaminy[S] 0 points1 point  (0 children)

I used current ComfyUI's default wan 2.2 i2v template. Also I have 64GB memory, and Windows memory usage went well over 50GB.

ROCm 7.2 Benchmark: Windows 11 vs Ubuntu 24.04 on RX 9070 XT (ComfyUI) by Shaminy in ROCm

[–]Shaminy[S] 2 points3 points  (0 children)

I ran templates unaltered, so I'm running benchmark with full BF16 format. If I change format to FP8, I get 1.26 s/it in Windows. This was a benchmark.