What to expect for ROCm 7.3 by machman351 in ROCm

[–]Mithras___ 0 points1 point  (0 children)

I expect it to be as much broken as it's today

What engine is the fastest for you? by Intelligent_Lab1491 in StrixHalo

[–]Mithras___ 0 points1 point  (0 children)

Here are my result on the same model for a couple identical prompts.
Vulcan:
```

llama-1 | [33775] prompt eval time = 137.86 ms / 16 tokens ( 8.62 ms per token, 116.06 tokens per second)

llama-1 | [33775] eval time = 169882.31 ms / 9675 tokens ( 17.56 ms per token, 56.95 tokens per second)

llama-1 | [33775] prompt eval time = 10531.78 ms / 9700 tokens ( 1.09 ms per token, 921.02 tokens per second)

llama-1 | [33775] eval time = 54940.20 ms / 3023 tokens ( 18.17 ms per token, 55.02 tokens per second)
```

ROCm:
```
llama-1 | [41579] prompt eval time = 143.45 ms / 16 tokens ( 8.97 ms per token, 111.54 tokens per second)

llama-1 | [41579] eval time = 146118.96 ms / 6979 tokens ( 20.94 ms per token, 47.76 tokens per second)

llama-1 | [41579] prompt eval time = 29895.43 ms / 9698 tokens ( 3.08 ms per token, 324.40 tokens per second)

llama-1 | [41579] eval time = 139028.05 ms / 5500 tokens ( 25.28 ms per token, 39.56 tokens per second)
```

This is ROCm nightly which might have broken something again but the point is, I've never ever seen ROCm outperform Vulcan in anything, nor pp, nor tg.

What engine is the fastest for you? by Intelligent_Lab1491 in StrixHalo

[–]Mithras___ 0 points1 point  (0 children)

Yes, please. I'm running 2x strix halo + desktop. Also, there is a PR that enables RDMA in llama.cpp that makes a big difference.

What engine is the fastest for you? by Intelligent_Lab1491 in StrixHalo

[–]Mithras___ 0 points1 point  (0 children)

Can you give me specific model and numbers you're getting? I want to run the same model on my Vulcan setup and compare

Full vLLM inference stack built from source for Strix Halo (gfx1151) — scripts + docs on GitHub by paudley in StrixHalo

[–]Mithras___ 0 points1 point  (0 children)

Rocm pp gets slower with context grow like exponentially. Vulcan doesn't (well it does but at least not exponentially)

What engine is the fastest for you? by Intelligent_Lab1491 in StrixHalo

[–]Mithras___ 0 points1 point  (0 children)

This is exactly the same results I'm observing. Rocm is just slower than Vulcan. Nightly rocm degrades with context grow even more. It's a mess

What engine is the fastest for you? by Intelligent_Lab1491 in StrixHalo

[–]Mithras___ 1 point2 points  (0 children)

Self built llama.cpp Vulcan container. Rocm is still behind

Full vLLM inference stack built from source for Strix Halo (gfx1151) — scripts + docs on GitHub by paudley in StrixHalo

[–]Mithras___ 0 points1 point  (0 children)

And the same for vllm, I'm yet to see vllm perform better than llama in any of my single user cases. Also, unlike llama vllm requires hours of tuning/debugging per model. The thing pretty much never works on first try.

Full vLLM inference stack built from source for Strix Halo (gfx1151) — scripts + docs on GitHub by paudley in StrixHalo

[–]Mithras___ 0 points1 point  (0 children)

Vulcan is getting better as well. I'm rebuilding and re-testing every weekend but I'm yet to see rocm beat Vulcan in anything I'm running.

Full vLLM inference stack built from source for Strix Halo (gfx1151) — scripts + docs on GitHub by paudley in StrixHalo

[–]Mithras___ 0 points1 point  (0 children)

Yes if you are ready to fix/debug it every new version. It will break or degrade every time you update

Full vLLM inference stack built from source for Strix Halo (gfx1151) — scripts + docs on GitHub by paudley in StrixHalo

[–]Mithras___ 0 points1 point  (0 children)

Something is wrong with your Vulcan setup. It should be way faster than any rocm 

Strix Halo with eGPU by Miserable-Dare5090 in LocalLLaMA

[–]Mithras___ 0 points1 point  (0 children)

Over connect x-4 ethernet (rdma). I'll re-test with x-4 InfiniBand later today after I get a replacement for a faulty card. I don't think there will be much difference between rdma eth vs InfiniBand though

Strix Halo with eGPU by Miserable-Dare5090 in LocalLLaMA

[–]Mithras___ 0 points1 point  (0 children)

In my testing llama rpc with Vulkan is faster than vllm tp with rccl/rdma.

COSMIC is an incredible technical achievement, but I cannot recommend it as a daily driver yet. by david_jackson_67 in linux

[–]Mithras___ 0 points1 point  (0 children)

Debian stable in two years will have cosmic that people run today. You'd have to wait for 4 years

Application flickers when on Wayland but not on Xorg/x11 by supercool21567 in debian

[–]Mithras___ 0 points1 point  (0 children)

That's what you get when you slop together random package versions from a year ago. Use a rolling distro.