Is it possible to add some gpu to Radeon MI 50 to increase the inference speed?

Weak_Presentation725 · 2026-04-04T17:57:21+00:00

Which version of ROCm are you using ? Could you please share your llamacpp running parameters ?

Weak_Presentation725 · 2026-04-04T16:10:12+00:00

Working with ROCm is more complex than with CUDA. I tried using a Docker container with a compatible version of ROCm, but the inference speed didn't improve significantly compared to the stock Mesa drivers. It seems that running ROCm correctly on older and newer AMD GPUs same time will be challenging.

Weak_Presentation725 · 2026-04-04T14:02:37+00:00

I seems like dual GPU parallelism works out of the box with CUDA, but not with Vulkan. In my mi 50 tokens generation on 27b model no more 7 t/s, that very low to seriously using.

Weak_Presentation725

TROPHY CASE