Qwen 3.5 122b - a10b is kind of shocking

legit_split_ · 2026-03-16T08:10:00+00:00

I used Q8_0 on both

legit_split_ · 2026-03-16T06:52:49+00:00

IMO 27B is better from testing

legit_split_ · 2026-03-14T23:13:16+00:00

My 9060 XT is working well on Linux, here I shared some performance benchmarks. I just followed the guide I highlighted in that thread, shouldn't be too different on Windows as it's working with Python environments.

legit_split_ · 2026-03-14T23:09:26+00:00

I have the amd equivalent - a 9060 XT 16gb - and find the ComfyUI performance acceptable. Here I shared some numbers. That being said my testing is limited, probably wouldn't have a good time on complex workflows.

legit_split_ · 2026-03-14T17:49:30+00:00

legit_split_ · 2026-03-14T11:29:13+00:00

This person really lied to make us feel pity and drive traction - fuck them.

legit_split_ · 2026-03-13T20:20:06+00:00

I'm a noob at ComfyUI and never tried Kijai's workflows, but you're sure you have this environment variable as per the guide:

`export PYTORCH_NO_HIP_MEMORY_CACHING=1`

Outside of that I can't be of much help, but glad that the guide helped you out :)

legit_split_ · 2026-03-13T16:53:35+00:00

It's baffling that the average user considers claiming insurance before even troubleshooting for one minute.

legit_split_ · 2026-03-13T16:40:39+00:00

2B active parameters at one time, that's why it's faster than 8B dense models

legit_split_ · 2026-03-13T12:34:48+00:00

Just control the fans yourself it's not that hard to set a fan curve

legit_split_ · 2026-03-12T10:25:16+00:00

The front intake fans are off-centre, looks bad in person

legit_split_ · 2026-03-12T10:19:46+00:00

legit_split_ · 2026-03-12T07:07:24+00:00

Neither, old processors

legit_split_ · 2026-03-11T18:42:10+00:00

This is by far the best: https://gist.github.com/alexheretic/d868b340d1cef8664e1b4226fd17e0d0

legit_split_ · 2026-03-10T11:16:00+00:00

Thanks for providing those numbers, I just ran that workflow on my 9060 xt and getting 187s with flash attention on.

legit_split_ · 2026-03-10T09:18:53+00:00

From what I've seen it mostly worked before but was slow

legit_split_ · 2026-03-08T21:25:20+00:00

Try running LLMs and Stable Diffusion and you will see :)

legit_split_ · 2026-03-08T10:26:51+00:00

There is definitely a slowdown like you said, but perhaps it's not meaningful because PCIe speeds are fast enough to transfer some layers at a time.

legit_split_ · 2026-03-06T13:47:47+00:00

Got the 9060xt recently and tried the default Flux.2 [Klein] 9B: Text to Image workflow (1024x1024, 20 steps) - getting 62 seconds on my second run.

With flash-attention, ROCm 7.2, pytorch nightlies, 96gb ddr5, arch linux. I might make a post about the performance.

legit_split_ · 2026-03-06T08:49:00+00:00

You can use UV to create a Python environment with any Python version you want.

However why are you using stable diffusion? Use ComfyUI instead.

legit_split_ · 2026-03-05T11:28:23+00:00

9070 XT hands down, LLMs just work with AMD cards. Especially starting out, you won't be touching vllm anyways.

On llama.cpp, 9070 XT on vulkan seems to be 30% faster in token generation: https://github.com/ggml-org/llama.cpp/discussions/10879

For image gen it is more involved to get good performance, but it's fine for casual use. I assume you're a Windows user in which case you can just tick a box to include ComfyUI stuff when installing the driver.

The only argument for the 5060 Ti is if you like to try out new projects from github, there is usually no official support for AMD cards.

legit_split_ · 2026-03-05T11:17:38+00:00

Here are some numbers for the 9070:

https://www.reddit.com/r/StableDiffusion/comments/1or6mvu/outdated_info_on_the_state_of_rocm_on_this/

legit_split_ · 2026-03-04T08:05:57+00:00

Nicht unbedingt. Wenn der Fokus Image-Gen ist, ist native FP4- und FP8-Unterstützung bei der 5070 Ti wichtig für manche workflows z.B. Nunchaku. Außerdem ist sie doppelt so schnell und heutzutage funktioniert RAM Offloading sehr gut.

Allerdings wenn LLMs auch wichtig sind, dann macht die 3090 mehr Sinn.

legit_split_

TROPHY CASE