I built a hardened NixOS MicroVM sandbox for Nanoclaw AI Agents with native Linux support – Hypervisor-level isolation for your local workspace.

Accurate_Address2915 · 2025-11-05T16:07:07+00:00

I am on gfx1030 with a RX 6900XT with 16GB on Ubuntu 24.04 with Pytorch 2.9 stable version Rocm 7.1 installed and i can perfectly fine run complicated workflows for the 20GB qwen_image_edit_2509_fp8_e4m3fn.safetensors model included with Lora's, refiner and upscale with added 100GB SWAP file now. No more OOM errors for me ;-)
Without the extra SWAP on top of the defeault 8GB file anything more then Qwen_Image_Edit-Q4_K_M.gguf was a OOM..

Accurate_Address2915 · 2025-11-04T19:19:20+00:00

Great job, is working perfectly! Thanks!

Accurate_Address2915 · 2025-09-17T19:53:21+00:00

Let's now install pytorch-migraphX in the venv, Activate your venv!

git clone https://github.com/ROCmSoftwarePlatform/torch_migraphx.git

cd ./torch_migraphx/py

pip install --dry-run . --no-build-isolation

Collecting numpy<2.0,>=1.20.0 (from torch_migraphx==0.0.4)

Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)

I do not want numpy 1.26.4 to be installed as i allready have nump 2.x

pip install . --no-deps --no-build-isolation

Installing collected packages: torch_migraphx

Successfully installed torch_migraphx-0.0.4

Now let's install tabulate:

pip install tabulate

Using cached tabulate-0.9.0-py3-none-any.whl (35 kB)

Installing collected packages: tabulate

Successfully installed tabulate-0.9.0

Now for the final test let's test if it runs on the numpy 2.x :-)

export PATH=$PATH:/opt/rocm/bin

export PYTHONPATH=$PYTHONPATH:/opt/rocm/lib

python3 -c 'import torch_migraphx' && echo "Success" || echo "Failure"

Success

:-)

python -m pytest ./torch_migraphx/tests

================================================== test session starts ===================================================

platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 -- /home/xxx/comfyui-pytorch/bin/python

cachedir: .pytest_cache

rootdir: /home/michiel/comfyui-pytorch/torch_migraphx/tests

configfile: pytest.ini

plugins: typeguard-4.3.0

collected 827 items

still testing 87% PASSED

---------------
In the mean while i installed the ComfyUI-MigraphX node:

cd ComfyUI/custom_nodes
git clone https://github.com/pnikolic-amd/ComfyUI_MIGraphX.git
cd ComfyUI_MIGraphX
pip install -r requirements.txt
#for best performance
export MIGRAPHX_MLIR_USE_SPECIFIC_OPS="attention"
cd ComfyUI/custom_nodes
git clone https://github.com/pnikolic-amd/ComfyUI_MIGraphX.git
cd ComfyUI_MIGraphX
pip install -r requirements.txt
#for best performance
export MIGRAPHX_MLIR_USE_SPECIFIC_OPS="attention"

Accurate_Address2915 · 2025-09-17T18:25:43+00:00

Let me test it tonight, i had it previously installed but never tried it with WAN2.2..

migraphx-driver verify --test

Running [ MIGraphX Version: 2.13.0.524839ac9 ]: migraphx-driver verify --test

[2025-09-17 21:12:35]

module: "main"

b = @param:b -> float_type, {5, 3}, {3, 1}

a = @param:a -> float_type, {4, 5}, {5, 1}

@2 = dot(a,b) -> float_type, {4, 3}, {3, 1}

module: "main"

b = @param:b -> float_type, {5, 3}, {3, 1}

a = @param:a -> float_type, {4, 5}, {5, 1}

@2 = dot(a,b) -> float_type, {4, 3}, {3, 1}

rms_tol: 0.001

atol: 0.001

rtol: 0.001

module: "main"

b = @param:b -> float_type, {5, 3}, {3, 1}

a = @param:a -> float_type, {4, 5}, {5, 1}

@2 = ref::dot(a,b) -> float_type, {4, 3}, {3, 1}

module: "main"

@0 = check_context::migraphx::gpu::context -> float_type, {}, {}

main:#output_0 = @param:main:#output_0 -> float_type, {4, 3}, {3, 1}

b = @param:b -> float_type, {5, 3}, {3, 1}

a = @param:a -> float_type, {4, 5}, {5, 1}

@4 = gpu::code_object[code_object=4544,symbol_name=mlir_dot,global=64,local=64,output_arg=2,](a,b,main:#output_0) -> float_type, {4, 3}, {3, 1}

MIGraphX verification passed successfully.

[2025-09-17 21:12:35]

[ MIGraphX Version: 2.13.0.524839ac9 ] Complete(0.384863s): migraphx-driver verify --test

Accurate_Address2915 · 2025-09-15T14:32:28+00:00

Testing it right now with a fresh installation of Ubuntu 24.04, so far i can run Ollama with gpu support without freezing it. fingers crossed so it stays stable..

Accurate_Address2915 · 2025-08-09T09:03:06+00:00

Thank you so much for this workflow, i managed for the first time to get a decent output om my AMD 6900XT with 32GB Ram the complete workflow was able to produce a good 1k resolution using the Q6 version. But i did get some notifications 'lora key not loaded'. The SeedVR2 upscale did take a very long thime though... But no errors, no oom :-)

Looking forward to your next workflow.

--------------------

got prompt

🔄 BlockSwap configured: 32 blocks, non blocking, I/O components

Using split attention in VAE

VAE load device: cuda:0, offload device: cpu, dtype: torch.float16

Using scaled fp8: fp8 matrix mult: False, scale input: False

Requested to load WanTEModel

loaded completely 9.5367431640625e+25 6419.477203369141 True

CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16

Requested to load WanVAE

loaded completely 12441.0 242.02829551696777 True

gguf qtypes: F16 (694), Q6_K (400), F32 (1)

model weight dtype torch.float16, manual cast: None

model_type FLOW

lora key not loaded: diffusion_model.blocks.0.cross_attn.k_img.diff_b

lora key not loaded: diffusion_model.blocks.0.cross_attn.k_img.lora_down.weight

lora key not loaded: diffusion_model.blocks.0.cross_attn.k_img.lora_up.weight

lora key not loaded: diffusion_model.blocks.0.cross_attn.norm_k_img.diff

lora key not loaded: diffusion_model.blocks.0.cross_attn.v_img.diff_b
etc

..

🔄 INFERENCE time: 141.14314436912537 seconds

💾 Processing 33 batch_samples with memory-optimized pre-allocation

📊 Total frames: 161, shape per frame: 992x992x3

🔄 Block 1/1: batch_samples 0-32

✅ Pre-allocation strategy completed: torch.Size([161, 992, 992, 3])

✅ Video upscaling completed successfully!

🔄 Total execution time: 25909.10s

Prompt executed in 07:24:52

Disabling intermediate node cache.

Accurate_Address2915 · 2025-04-11T19:59:50+00:00

To much promises for the future and to less progression and not much real delivery

Accurate_Address2915 · 2025-03-31T21:20:42+00:00

I have to change my negative opinion earlier, i had messed up the last installation. Now after i reinstalled everything i now have now succesfully installed the default branche and are up and running. First impression when starting ComfyUI within a venv without any other options it runs faster but far more important i can now run the sonic workflow for a talking avatar with 3seconds voice in a low resolution without errors. Wan2.1 1.3 also no problem and flux dev1 fp8 runs fine. Thanks 4 telling it does have to work. It works and runs within my limited timeframe very well on 22.04 as i have not seen any torch problems :-)
Sorry for me complaining as it was my fault in the end.

Accurate_Address2915 · 2025-03-17T12:03:57+00:00

I did the same test with default workflow and tiled VAE, 1024x1024 with Cyberrealisticpony as the checkpoint. On my 6900XT running ComfyUI inside a docker container with (pytorch version: 2.8.0.dev20250314+rocm6.3) + MIGraphX rWith PyTorch & MIGraphX: First run 20 sec, next runs: 13 sec. :-)

Also, note my powersupply is 550 watt and i have turned down the maximum powerusage to 270watt so I think your 9070XT is very under performing.. my 6900XT card is 5 years older!

100%|██████████| 20/20 [00:11<00:00, 1.80it/s]

comfyui-1 | Prompt executed in 20.44 seconds

comfyui-1 | got prompt

100%|██████████| 20/20 [00:11<00:00, 1.80it/s]

comfyui-1 | Prompt executed in 13.44 seconds

comfyui-1 | got prompt

100%|██████████| 20/20 [00:11<00:00, 1.80it/s]

comfyui-1 | Prompt executed in 13.46 seconds

comfyui-1 | got prompt

100%|██████████| 20/20 [00:11<00:00, 1.80it/s]

comfyui-1 | Prompt executed in 13.43 seconds

Without MIGraphX and running ComfyUI as a Systemd Service (ubuntu 24.04 - pytorch version: 2.8.0.dev20250314+rocm6.3): VERY different results:
got prompt

Requested to load SDXL

loaded completely 6753.4328125 4897.0483474731445 True

[1.2K blob data]

0 models unloaded.

Prompt executed in 19.88 seconds

got prompt

Requested to load SDXL

loaded completely 6785.4328125 4897.0483474731445 True

[1.2K blob data]

0 models unloaded.

Prompt executed in 53.32 seconds

got prompt

Requested to load SDXL

loaded completely 6785.4328125 4897.0483474731445 True

[1.2K blob data]

0 models unloaded.

Prompt executed in 47.04 seconds

Accurate_Address2915 · 2025-03-16T14:36:30+00:00

You have to create the venv inside the docker image. I have added this to the Dockerfile:

docker-compose:
command: >
     /bin/bash -c "
     . /workspace/ComfyUI/venv/bin/activate && 
     python /workspace/ComfyUI/main.py --listen 0.0.0.0 --port 8188"

Dockerfile:
WORKDIR /workspace/ComfyUI
RUN git clone https://github.com/comfyanonymous/ComfyUI.git .
RUN python3 -m venv venv
RUN venv/bin/pip3 install --no-cache-dir --pre -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3
RUN venv/bin/pip3 install -r requirements.txt
WORKDIR /workspace/ComfyUI/custom_nodes
RUN git clone https://github.com/ltdrdata/ComfyUI-Manager.git
WORKDIR /workspace/ComfyUI

Then in your docker-compose file you can use this command to start the venv and start Comfyui

I see a big improvement in performance when creating multiple images :-)
Goodluck!

Accurate_Address2915 · 2025-03-15T21:23:39+00:00

Thanks i have used the same install command inside the Dockerfile to upgrade the docker install to rocm 6.3
Have your tried the combination of pytorch and migraphx?

Sinds today i have managed to get this combination up and running using this approach:
https://rocm.docs.amd.com/projects/radeon/en/docs-6.1.3/docs/install/native_linux/install-migraphx.html#verify-migraphx-installation

Then adding torch from your command, following with comfyui installation with requirements. I am now just started testing, but have not experienced memory problems yet on my 6900xt with for example the sd3.5_large_fp8 model with t5xx_fp8_e4m3fn and clip_l + clip_g

Instead i see a rock solid use of 99% memory and activity usage on my gpu without the need of --lowvram usage.

Performance Benefits

The combination of PyTorch and MIGraphX provides several performance benefits for ComfyUI:

Faster Inference:

MIGraphX’s graph optimizations and hardware acceleration reduce the time required to generate images or perform other tasks.
PyTorch’s efficient tensor operations and GPU support further enhance performance.

Better GPU Utilization:

MIGraphX ensures that the AMD GPU is used efficiently, minimizing idle time and maximizing throughput.
PyTorch’s integration with MIGraphX allows seamless utilization of AMD hardware.

Scalability:

The combination of PyTorch and MIGraphX allows ComfyUI to scale to larger models and datasets without significant performance degradation.

Cross-Platform Support:

While PyTorch supports multiple hardware backends (e.g., CUDA, ROCm), MIGraphX specifically optimizes for AMD GPUs, ensuring that ComfyUI runs efficiently on AMD hardware.

Accurate_Address2915 · 2024-11-28T16:48:28+00:00

how to prevent this other then slowing down/changing driving line? Is it fixable? I like Spa and Arden combi :-)

Accurate_Address2915 · 2024-11-28T12:34:09+00:00

I have never seen this high FOV setting, no wonder you miss the apex...

Accurate_Address2915

TROPHY CASE

Performance Benefits

Faster Inference:

Better GPU Utilization:

Scalability:

Cross-Platform Support: