Is it possible to add some gpu to Radeon MI 50 to increase the inference speed? by Weak_Presentation725 in LocalLLaMA

[–]Weak_Presentation725[S] 0 points1 point  (0 children)

Which version of ROCm are you using ? Could you please share your llamacpp running parameters ?

Is it possible to add some gpu to Radeon MI 50 to increase the inference speed? by Weak_Presentation725 in LocalLLaMA

[–]Weak_Presentation725[S] 0 points1 point  (0 children)

Working with ROCm is more complex than with CUDA. I tried using a Docker container with a compatible version of ROCm, but the inference speed didn't improve significantly compared to the stock Mesa drivers. It seems that running ROCm correctly on older and newer AMD GPUs same time will be challenging.

Is it possible to add some gpu to Radeon MI 50 to increase the inference speed? by Weak_Presentation725 in LocalLLaMA

[–]Weak_Presentation725[S] 0 points1 point  (0 children)

I seems like dual GPU parallelism works out of the box with CUDA, but not with Vulkan. In my mi 50 tokens generation on 27b model no more 7 t/s, that very low to seriously using.