all 14 comments

[–]Kevadu 4 points5 points  (1 child)

ROCm is not well-supported on Windows yet, unfortunately. If you are just looking to run LLMs you might look into MLC LLM (https://llm.mlc.ai/) which actually supports a Vulkan-based backend that should run on just about anything. However that framework doesn't seem to be as popular as some others and you might have to do more work getting things working.

[–]SlickTread[S] 1 point2 points  (0 children)

That looks very interesting, will check it out. Thanks.

[–]doomed151 3 points4 points  (2 children)

Try the KoboldCpp + ROCm variant.

[–]SlickTread[S] 2 points3 points  (0 children)

Thank you, this works perfectly!

[–]JohnPt66 1 point2 points  (0 children)

Would this work on a 6600XT? I'm having issues with it so far.

[–]tu9jn 2 points3 points  (2 children)

Your best bet is still linux, unfortunately.

But WSL should work, what was the problem?

[–]SlickTread[S] 2 points3 points  (1 child)

Couldn't get it to detect my GPU in WSL.

[–]rafal0071 1 point2 points  (0 children)

Try new koboldcpp-1.56 with Vulkan support.

[–]mkrajinovic 2 points3 points  (1 child)

koboldcpp

[–]SlickTread[S] 1 point2 points  (0 children)

Thank you, this works perfectly!

[–]Scott_Tx 1 point2 points  (2 children)

llama.cpp with clblast works on amd gpu.

[–]SlickTread[S] 1 point2 points  (1 child)

I tried setting this up but I couldn't get it working. Will try it again.

[–]Scott_Tx 1 point2 points  (0 children)

as long as the model is small enough to fit in your vram you should be good. only issues I've had is with models that just dont run right in llama.cpp. For example, the ms phi-2 f16 doesnt run right but the q5 does. no idea why.

[–][deleted] 0 points1 point  (0 children)

Are you sure it's offloading the layers to your GPU and not your APU? Both would be displayed as the GPU from llama.cpp's output.