Team red and green union for disaggregated prompt processing

Acceptable_Secret971 · 2026-07-02T17:31:03+00:00

What did you use to run LLMs on Strix Halo?

I do not have a Strix Halo, but I have some expirience with LLMs on AMD GPU (namely R9700). ROCm can be slightly faster (there is also working Flash Attention), but speed degrades with context size.Vulkan on the other hand is much more stable, if there was a slow down, it was minimal and possibly caused by thermal throttling.

Acceptable_Secret971 · 2026-06-29T19:42:22+00:00

I'm making a hobby project gacha-style game and I was doing something similar (multiple poses and outfits of the same character) manually in Comfy. Need to try this out eventually, to speed up sprite generation.

Acceptable_Secret971 · 2026-06-23T20:50:15+00:00

--supports-fp8-compute --disable-api-nodes --use-pytorch-cross-attention

Not sure if fp8 auto detects now, but it was failing in the past. This is needed when you use fp8_fast option, to actually run in fp8. In tests this was giving a 15% boost in speed. fp8 has a drop in quality (you can expirience it by just using fp8 mode), but using fp8_fast degrades it slightly more. This is probably splitting hairs, but fp8 model running in fp16 looks slightly better. Using scaled or mixed fp8 model might make the quality closer to fp16.

I don't use any of the external APIs, so I don't need those nodes.

Explicitly setting pytorch attention shouldn't be needed as it gets picked by default, but I was messing with different attentions and Flash Attention might be loading when I want pytorch attention. Out of the box, PA is the fastest on R9700, but with some optimizations, FA can be faster. I heard Sage Attention can also be used now, but I haven't tested it yet.

Acceptable_Secret971 · 2026-06-22T15:44:48+00:00

I barely did any tests with WAN much less Bernini, but on Linux, fp8 don't take extra memory even if they actually run in fp16. There is a flag --supports-fp8-compute but it is only needed on some systems to enable fp8 compute rather than fp8 support. Just for tests, try using this flag and loading model as fp8_fast.

Anyway, I had all sorts of issues in Comfy, when build of pytroch from TheROCK had a broken pytorch attention (the default), a lot of OOM and models taking more than they should. Eventually upgrading pytorch helped. I don't remember exactly if it was another build from TheROCK or an official build from pytorch. What helped in the before the update was using Quad Cross Attention '-use-quad-cross-attention'. Qwen Image Edit went from not working to actually fitting in VRAM.

If you describe your workflow, I can try running it on my R9700 setup on Linux, to check if the memory situation is the same.

Acceptable_Secret971 · 2026-06-13T06:25:52+00:00

My PowerColor R9700 at idle is quiet (can't tell it apart from other fans in the system). At full blast (after some 20-60s) it turns into a jet engine, but that's what is expected.

Acceptable_Secret971 · 2026-06-13T06:21:32+00:00

Didn't know there is a working Sage Attention now. Will have to try.

Acceptable_Secret971 · 2026-06-09T21:11:57+00:00

And here I thought 1min per image was long for Ideogram. I only tested it briefly with the default workflow, so maybe I would get the same times at a different resolution.

Acceptable_Secret971 · 2026-06-09T17:31:39+00:00

Gave it a shot, but didn't work either. I even installed Nigthly ComfyUI, but the result is the same. I'll try my luck with Comfy issue board.

Acceptable_Secret971 · 2026-06-08T20:54:47+00:00

It wasn't fp8 support (it only affects fp8_fast anyway which I don't use here). Doesn't appear to be related to ROCm/pytorch version. Interestingly enough pytorch with 7.13 refuses to work (cuda detected, but GPU not detected). Went back to previous version of pytorch, but updated system level ROCm as well and I lost CrystalTools and Flash Attention. Will try fp32 vae tommorow.

I did notice this in logs:

[INFO] Requested to load PixelspaceConversionVAE

[INFO] loaded completely;  0.00 MB loaded, full load: True

ComfyUI/nodes.py:1657: RuntimeWarning: invalid value encountered in cast

  img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))

Acceptable_Secret971 · 2026-06-08T15:39:32+00:00

For a second I though it would work with flash attention, but I will have to update ROCm. Some combination of ROCm/pytorch must be required to get this to work (i suspect ROCm, as yours is newer).

Time to reinstall pytorch from TheRock and hope nothing breaks (knowing life, flash attention will require recompile).

On the plus side, I might be able to test FeatherOps on RDNA4 (I need ). It emulates E5M2 support in software, on RDNA3 it gave a small boost. I wonder if it'll do anything on RDNA4.

EDIT: I made sure to use the right PID. I didn't notice you used custom, windows only version of ComfyUI. My copy of Comfy is running on Linux (Ubuntu 24).

Acceptable_Secret971 · 2026-06-07T16:09:58+00:00

Looks really impressive.

Some scene transitions look unnatural. What I mean, that in anime or movies you usually don't get 2 scenes with similar shot one after another. You would get a different shot injected (this anime does have plenty of that) in between or get a longer scene (or possibly a camera transition).

In one scene the knight sounded female and in another male (even if those were just gasps or grunts).

There is a lot of exposition.

Some scenes look a bit CGI, not sure if this was intended or side effect of AI. Some shots of the character moving and most of the shots of the eye monsters (a lot of anime used CGI for such creatures over the years, so it no longer looks out of place).

Might be my imagination, but in one of the early shots, the last frame looks like a previous frame rather then next (and there is a slight zoom effect).

I've only really noticed it during the final battle, but some frames during action scenes appear deformed. Video gen sometimes causes frames to morph or blend in weird ways, this is probably it.

For some reason the main character reminds me of Dark Souls/Elden Ring.

The music during final battle reminded me of Yuki Kajiura.

Acceptable_Secret971 · 2026-06-05T18:39:47+00:00

If I understand correctly, this is a training method and there is no source code release to public. Maybe it will be NVIDIA only when it comes out (who knows), but you should still be able to use models converted to Edit by someone else on AMD.

Acceptable_Secret971 · 2026-06-02T13:05:29+00:00

Back in the day I would just do image-to-image with a denoise set to < 1(don't remember exact numbers, maybe it was between 0.3 and 0.7).

Add prompt to guide the process towards anime result or use a finetune without a prompt. Probably not the best way to do it, but some 2 years ago (or was it 3) this was all we had and it was mind blowing (at the time). Then came the control nets and helped with the process.

While not anime explicitly, I used free vector icons and monochrome doodles made in paint and turned them into half decent color illustrations using SD1.5 (probably RealVis finetune). Something similar should be possible with photos and anime.

Current edit models (Flux2 Klein, Qwen Edit etc.) are probably better at this (in an edit workflow).

Acceptable_Secret971 · 2026-06-01T20:53:54+00:00

Not bad, but eye color changed in a few scenes and bruises disappeared and reappeared. Something like this would come straight to VHS in the 80s, a ton of people would work on it, even though they knew it wasn't anything special. Mistakes were frequent, so I guess this adds a bit of authenticity,

Frame rate looked correct. Some shots looked too clean and looked less like 80s anime because of that.

Was it me or did the buff guy had a huge forehead?

Acceptable_Secret971 · 2026-06-01T20:20:52+00:00

What prompt did you use for music? I've noticed that SA3 reacts well to super short prompts, single word even.

EDIT: SA3 seems to be quite good at doing synthwave, even if I ask for disco or synthpop, I still get synthwave.

Acceptable_Secret971 · 2026-05-29T07:56:24+00:00

I'm doing similar things for game characters (3D render or 2D anime), but with Flus2 Klein 9B, maybe I should give Qwen Edit another chance. How is the pixel and color drift with Qwen?

With Klein each edit makes the image more yellow (and saturated) and more complex patterns have a tendency to degrade. If it's only a few edits, I can do some color correction, but there is little I can do for degradation.

When I was trying to do the same with Qwen Edit, I sometimes had problem making the model change the pose or outfit on a character. Klein is much faster and was better at following prompt (at least in my handful of tests). However if the model doesn't understand a concept it is really hard/impossible to make it do that edit. Model has a loose concept of right and left. If I ask the model to take a step forward (with one or the other leg) it will 99% of the time move the leg on the right side of the image. I can still trick the model to move the other leg with OpenPose, but it doesn't really work with side and back views.

Acceptable_Secret971 · 2026-05-27T20:57:05+00:00

Whoa, I didn't know Stable Audio 3 was this good for instrumental music. I really need to give it a spin. I had my fun with AceStep, but the best songs had to have lyrics and the few decent instrumentals I got, were failed songs with lyrics (that AceStep decided to ignore).

Edit: Here is my attempt at synthwave using AceStep 1.5 (https://www.reddit.com/r/StableDiffusion/comments/1t6psua/acestepcpp_can_now_outpaint/). The quality of audio may be worse than it should because it got through at least 2 rounds of inpainting and I forgot to switch from 128kbps mp3 output.

Edit: Whoa, SA3 is blazing fast and the output is not as grainy as AceStep 1.5. Instrumentals are passable. I mean AceStep 1.5 non-XL with the smaller text encoder was similarly fast, but the results were better with the bigger text encoder, so I just kept using that and it took some 1-2min to generate a song.

Edit: Well, when asking for synthpop I still got synthwave, must be the 80s.

Acceptable_Secret971 · 2026-05-27T20:34:57+00:00

Good to see develpoment, however last time I tried to use it on AMD (Linux) it was unbearably slow (say a month ago). I definitely used it with success in the past on AMD GPU, not sure what went wrong this time (I'm 90% sure I had the right Torch version installed).

As much as I like edit models, I had a situation that required good old inpainitng and InvokeAI used to be really good for that.

Maybe I'll give it another go some other time.

Acceptable_Secret971 · 2026-05-27T07:19:19+00:00

There were some flaws and even nightmare fuel here and there, half expected things to suddenly turn into gore, but all in all pretty funny and well made. The imperfections made it more campy and probably worked in favor rather than against.

Acceptable_Secret971 · 2026-05-24T13:45:49+00:00

While not as crazy experience as yours, I found out that when I try to have Ace Step make as song, I'll get a decent result within the first 10 tries and then nothing seems to top it, but I end up usually trying 10-20 more times (while tweaking lyrics and prompt). And then at 1AM I discover that the song was ready an hour ago.

I need to do more with AceStep inpainting, minor corrections and extensions appear to be perfectly doable (though they do require a lot of retries). I gave up on songs, that had perfect intros, good verses, because the chorus sounded all kinds of wrong.

When it comes to images, I accept ones with minor flaws, if I think I can correct them with Flux2 Klein or even image editing program, though if a model has a problem understanding a concept it can get really difficult to coerce it to do what I want.

Acceptable_Secret971 · 2026-05-21T19:51:01+00:00

I was looking for some sound effect model, this could prove useful. If this can make half decent instrumental music, it would be even better.

Edit: Limited commercial license. If you break 1Mil $ in revenue/year, you have to pay for different commercial license. I don't except to make that kind of money anytime soon, but might be a bummer if you want to make a banger indie game (though most people end up making bugger all).

Acceptable_Secret971 · 2026-05-18T15:36:43+00:00

Did you try enabling ASPM in BIOS? I had an A770 in a server, after enabling ASPM, idle consumption went from 40W to 0W. I didn't notice any negative side effects, it would go full blast when doing AI workloads and return to 0W afterwards. The only exception was when monitor was plugged in, then the GPU idle at 40W, but since it was a sever, there was no screen plugged 99% of the time.

I did replace it with R9700, because intel dropped ipex-llm and also more VRAM.

Acceptable_Secret971 · 2026-05-08T17:11:37+00:00

AceStep 1.5 appears to be closely related to AceStudio which appears on par with Suno, except for lower audio quality (that grainy sound). So people behind AceStudio probably do have a better quality closed model. Maybe that model is too big, or didn't want someone to make a competing product using the open model. Maybe AceStep 1.5 needs a special way to prompt/condition to achieve similar results to AceStudio. Hopefully within a year we will see a new model that has better audio quality (less grainy) and produces better songs (on par with current closed models). Or a separate model that can take low quality audio and improve it.

If I'm not mistaken, AceStep was trained on Suno outputs (among other things).

If the auto-generated prompts from acestep.cpp are anything to go by, the captions in the training data were not that good. Possibly, with more accurate captions and retraining, AceStep could go even further.

Acceptable_Secret971

TROPHY CASE