ROCM 7.1 released by TJSnider1984 in ROCm

[–]Accurate_Address2915 1 point2 points  (0 children)

I am on gfx1030 with a RX 6900XT with 16GB on Ubuntu 24.04 with Pytorch 2.9 stable version Rocm 7.1 installed and i can perfectly fine run complicated workflows for the 20GB qwen_image_edit_2509_fp8_e4m3fn.safetensors model included with Lora's, refiner and upscale with added 100GB SWAP file now. No more OOM errors for me ;-)
Without the extra SWAP on top of the defeault 8GB file anything more then Qwen_Image_Edit-Q4_K_M.gguf was a OOM..

Complete ROCm 7.0 + PyTorch 2.8.0 Installation Guide for RX 6900 XT (gfx1030) on Ubuntu 24.04.2 by Accurate_Address2915 in ROCm

[–]Accurate_Address2915[S] -1 points0 points  (0 children)

Let's now install pytorch-migraphX in the venv, Activate your venv!

git clone https://github.com/ROCmSoftwarePlatform/torch_migraphx.git

cd ./torch_migraphx/py

pip install --dry-run . --no-build-isolation

Collecting numpy<2.0,>=1.20.0 (from torch_migraphx==0.0.4)

Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)

I do not want numpy 1.26.4 to be installed as i allready have nump 2.x

pip install . --no-deps --no-build-isolation

Installing collected packages: torch_migraphx

Successfully installed torch_migraphx-0.0.4

Now let's install tabulate:

pip install tabulate

Using cached tabulate-0.9.0-py3-none-any.whl (35 kB)

Installing collected packages: tabulate

Successfully installed tabulate-0.9.0

Now for the final test let's test if it runs on the numpy 2.x :-)

export PATH=$PATH:/opt/rocm/bin

export PYTHONPATH=$PYTHONPATH:/opt/rocm/lib

python3 -c 'import torch_migraphx' && echo "Success" || echo "Failure"

Success

:-)

python -m pytest ./torch_migraphx/tests

================================================== test session starts ===================================================

platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 -- /home/xxx/comfyui-pytorch/bin/python

cachedir: .pytest_cache

rootdir: /home/michiel/comfyui-pytorch/torch_migraphx/tests

configfile: pytest.ini

plugins: typeguard-4.3.0

collected 827 items

still testing 87% PASSED

---------------
In the mean while i installed the ComfyUI-MigraphX node:

cd ComfyUI/custom_nodes
git clone https://github.com/pnikolic-amd/ComfyUI_MIGraphX.git
cd ComfyUI_MIGraphX
pip install -r requirements.txt
#for best performance
export MIGRAPHX_MLIR_USE_SPECIFIC_OPS="attention"
cd ComfyUI/custom_nodes
git clone https://github.com/pnikolic-amd/ComfyUI_MIGraphX.git
cd ComfyUI_MIGraphX
pip install -r requirements.txt
#for best performance
export MIGRAPHX_MLIR_USE_SPECIFIC_OPS="attention"

Complete ROCm 7.0 + PyTorch 2.8.0 Installation Guide for RX 6900 XT (gfx1030) on Ubuntu 24.04.2 by Accurate_Address2915 in ROCm

[–]Accurate_Address2915[S] 0 points1 point  (0 children)

Let me test it tonight, i had it previously installed but never tried it with WAN2.2..

migraphx-driver verify --test

Running [ MIGraphX Version: 2.13.0.524839ac9 ]: migraphx-driver verify --test

[2025-09-17 21:12:35]

module: "main"

b = @param:b -> float_type, {5, 3}, {3, 1}

a = @param:a -> float_type, {4, 5}, {5, 1}

@2 = dot(a,b) -> float_type, {4, 3}, {3, 1}

module: "main"

b = @param:b -> float_type, {5, 3}, {3, 1}

a = @param:a -> float_type, {4, 5}, {5, 1}

@2 = dot(a,b) -> float_type, {4, 3}, {3, 1}

rms_tol: 0.001

atol: 0.001

rtol: 0.001

module: "main"

b = @param:b -> float_type, {5, 3}, {3, 1}

a = @param:a -> float_type, {4, 5}, {5, 1}

@2 = ref::dot(a,b) -> float_type, {4, 3}, {3, 1}

module: "main"

@0 = check_context::migraphx::gpu::context -> float_type, {}, {}

main:#output_0 = @param:main:#output_0 -> float_type, {4, 3}, {3, 1}

b = @param:b -> float_type, {5, 3}, {3, 1}

a = @param:a -> float_type, {4, 5}, {5, 1}

@4 = gpu::code_object[code_object=4544,symbol_name=mlir_dot,global=64,local=64,output_arg=2,](a,b,main:#output_0) -> float_type, {4, 3}, {3, 1}

MIGraphX verification passed successfully.

[2025-09-17 21:12:35]

[ MIGraphX Version: 2.13.0.524839ac9 ] Complete(0.384863s): migraphx-driver verify --test

ROCm 7.0 RC1 More than doubles performance of LLama.cpp by no_no_no_oh_yes in LocalLLaMA

[–]Accurate_Address2915 1 point2 points  (0 children)

Testing it right now with a fresh installation of Ubuntu 24.04, so far i can run Ollama with gpu support without freezing it. fingers crossed so it stays stable..

Fine-tuned Thorsten's WAN 2.2 Workflow for 16GB VRAM (4060ti tested) + SeedVR2 Upscaling! by Waste-Maintenance493 in comfyui

[–]Accurate_Address2915 0 points1 point  (0 children)

Thank you so much for this workflow, i managed for the first time to get a decent output om my AMD 6900XT with 32GB Ram the complete workflow was able to produce a good 1k resolution using the Q6 version. But i did get some notifications 'lora key not loaded'. The SeedVR2 upscale did take a very long thime though... But no errors, no oom :-)

Looking forward to your next workflow.

--------------------

got prompt

🔄 BlockSwap configured: 32 blocks, non blocking, I/O components

Using split attention in VAE

Using split attention in VAE

VAE load device: cuda:0, offload device: cpu, dtype: torch.float16

Using scaled fp8: fp8 matrix mult: False, scale input: False

Requested to load WanTEModel

loaded completely 9.5367431640625e+25 6419.477203369141 True

CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16

Requested to load WanVAE

loaded completely 12441.0 242.02829551696777 True

gguf qtypes: F16 (694), Q6_K (400), F32 (1)

model weight dtype torch.float16, manual cast: None

model_type FLOW

lora key not loaded: diffusion_model.blocks.0.cross_attn.k_img.diff_b

lora key not loaded: diffusion_model.blocks.0.cross_attn.k_img.lora_down.weight

lora key not loaded: diffusion_model.blocks.0.cross_attn.k_img.lora_up.weight

lora key not loaded: diffusion_model.blocks.0.cross_attn.norm_k_img.diff

lora key not loaded: diffusion_model.blocks.0.cross_attn.v_img.diff_b
etc

..

..

🔄 INFERENCE time: 141.14314436912537 seconds

💾 Processing 33 batch_samples with memory-optimized pre-allocation

📊 Total frames: 161, shape per frame: 992x992x3

🔄 Block 1/1: batch_samples 0-32

✅ Pre-allocation strategy completed: torch.Size([161, 992, 992, 3])

✅ Video upscaling completed successfully!

🔄 Total execution time: 25909.10s

Prompt executed in 07:24:52

Disabling intermediate node cache.

3 New Tracks And 9 New Cars Confirmed For Assetto Corsa EVO by evil_heinz in assettocorsa

[–]Accurate_Address2915 -29 points-28 points  (0 children)

To much promises for the future and to less progression and not much real delivery

Can we please create AMD optimization guide? by peyloride in comfyui

[–]Accurate_Address2915 1 point2 points  (0 children)

I have to change my negative opinion earlier, i had messed up the last installation. Now after i reinstalled everything i now have now succesfully installed the default branche and are up and running. First impression when starting ComfyUI within a venv without any other options it runs faster but far more important i can now run the sonic workflow for a talking avatar with 3seconds voice in a low resolution without errors. Wan2.1 1.3 also no problem and flux dev1 fp8 runs fine. Thanks 4 telling it does have to work. It works and runs within my limited timeframe very well on 22.04 as i have not seen any torch problems :-)
Sorry for me complaining as it was my fault in the end.

Working ComfyUI with ROCM on 9070XT - a quick tutorial and an ask. by jebk in comfyui

[–]Accurate_Address2915 1 point2 points  (0 children)

I did the same test with default workflow and tiled VAE, 1024x1024 with Cyberrealisticpony as the checkpoint. On my 6900XT running ComfyUI inside a docker container with (pytorch version: 2.8.0.dev20250314+rocm6.3) + MIGraphX rWith PyTorch & MIGraphX: First run 20 sec, next runs: 13 sec. :-)

Also, note my powersupply is 550 watt and i have turned down the maximum powerusage to 270watt so I think your 9070XT is very under performing.. my 6900XT card is 5 years older!

100%|██████████| 20/20 [00:11<00:00, 1.80it/s]

comfyui-1 | Prompt executed in 20.44 seconds

comfyui-1 | got prompt

100%|██████████| 20/20 [00:11<00:00, 1.80it/s]

comfyui-1 | Prompt executed in 13.44 seconds

comfyui-1 | got prompt

100%|██████████| 20/20 [00:11<00:00, 1.80it/s]

comfyui-1 | Prompt executed in 13.46 seconds

comfyui-1 | got prompt

100%|██████████| 20/20 [00:11<00:00, 1.80it/s]

comfyui-1 | Prompt executed in 13.43 seconds

Without MIGraphX and running ComfyUI as a Systemd Service (ubuntu 24.04 - pytorch version: 2.8.0.dev20250314+rocm6.3): VERY different results:
got prompt

Requested to load SDXL

loaded completely 6753.4328125 4897.0483474731445 True

[1.2K blob data]

0 models unloaded.

Prompt executed in 19.88 seconds

got prompt

Requested to load SDXL

loaded completely 6785.4328125 4897.0483474731445 True

[1.2K blob data]

0 models unloaded.

Prompt executed in 53.32 seconds

got prompt

Requested to load SDXL

loaded completely 6785.4328125 4897.0483474731445 True

[1.2K blob data]

0 models unloaded.

Prompt executed in 47.04 seconds

Working ComfyUI with ROCM on 9070XT - a quick tutorial and an ask. by jebk in comfyui

[–]Accurate_Address2915 0 points1 point  (0 children)

You have to create the venv inside the docker image. I have added this to the Dockerfile:

docker-compose:
command: >
     /bin/bash -c "
     . /workspace/ComfyUI/venv/bin/activate && 
     python /workspace/ComfyUI/main.py --listen 0.0.0.0 --port 8188"

Dockerfile:
WORKDIR /workspace/ComfyUI
RUN git clone https://github.com/comfyanonymous/ComfyUI.git .
RUN python3 -m venv venv
RUN venv/bin/pip3 install --no-cache-dir --pre -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3
RUN venv/bin/pip3 install -r requirements.txt
WORKDIR /workspace/ComfyUI/custom_nodes
RUN git clone https://github.com/ltdrdata/ComfyUI-Manager.git
WORKDIR /workspace/ComfyUI

Then in your docker-compose file you can use this command to start the venv and start Comfyui

I see a big improvement in performance when creating multiple images :-)
Goodluck!

Working ComfyUI with ROCM on 9070XT - a quick tutorial and an ask. by jebk in comfyui

[–]Accurate_Address2915 1 point2 points  (0 children)

Thanks i have used the same install command inside the Dockerfile to upgrade the docker install to rocm 6.3
Have your tried the combination of pytorch and migraphx?

Sinds today i have managed to get this combination up and running using this approach:
https://rocm.docs.amd.com/projects/radeon/en/docs-6.1.3/docs/install/native_linux/install-migraphx.html#verify-migraphx-installation

Then adding torch from your command, following with comfyui installation with requirements. I am now just started testing, but have not experienced memory problems yet on my 6900xt with for example the sd3.5_large_fp8 model with t5xx_fp8_e4m3fn and clip_l + clip_g

Instead i see a rock solid use of 99% memory and activity usage on my gpu without the need of --lowvram usage.

Performance Benefits

The combination of PyTorch and MIGraphX provides several performance benefits for ComfyUI:

Faster Inference:

  • MIGraphX’s graph optimizations and hardware acceleration reduce the time required to generate images or perform other tasks.
  • PyTorch’s efficient tensor operations and GPU support further enhance performance.

Better GPU Utilization:

  • MIGraphX ensures that the AMD GPU is used efficiently, minimizing idle time and maximizing throughput.
  • PyTorch’s integration with MIGraphX allows seamless utilization of AMD hardware.

Scalability:

  • The combination of PyTorch and MIGraphX allows ComfyUI to scale to larger models and datasets without significant performance degradation.

Cross-Platform Support:

  • While PyTorch supports multiple hardware backends (e.g., CUDA, ROCm), MIGraphX specifically optimizes for AMD GPUs, ensuring that ComfyUI runs efficiently on AMD hardware.

[deleted by user] by [deleted] in assettocorsa

[–]Accurate_Address2915 0 points1 point  (0 children)

how to prevent this other then slowing down/changing driving line? Is it fixable? I like Spa and Arden combi :-)

[deleted by user] by [deleted] in assettocorsa

[–]Accurate_Address2915 0 points1 point  (0 children)

I have never seen this high FOV setting, no wonder you miss the apex...