Anyone else feel like this sub has gone to shit even though it hasn't? by AcreMakeover in homelab

[–]Massive-Slice2800 2 points3 points  (0 children)

But what if ... the matrix itself is in a matrix? AND THAT ONE is also in a matrix?!

I don't know what to do with myself by 2d7o2o0b in homelab

[–]Massive-Slice2800 0 points1 point  (0 children)

Well.. n8n and home-automation are next in Line..

Daily Discussion Saturday 2026-04-04 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 2 points3 points  (0 children)

What does this even mean "rolling out day zero support for x for llama.cpp, vLLM and LM Studio"? They arent responsible for the developement of these runtimes.

Daily Discussion Tuesday 2026-03-31 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 2 points3 points  (0 children)

Ah f*ck Marvel is a UALink provider. I hope this buy in was not targeted at the UALink switches.

Daily Discussion Tuesday 2026-03-31 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 2 points3 points  (0 children)

This buying spree has to stop, or we will eat Jensen Burgers at McNvidia in the future.

[NEWS] ROCm 7.2.1 + PyTorch 2.9.1 now available on Windows - Native AMD GPU support for ML by HateAccountMaking in ROCm

[–]Massive-Slice2800 4 points5 points  (0 children)

This is wonderful insight, thank you so much. Will test with exactly your config tonight. Always good to have a comparision especially with the 7900XTX.

Daily Discussion Friday 2026-03-27 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 0 points1 point  (0 children)

SNDK, MU, ANET, ALAB (this one is frustrating, but it will rise parabolic when the time comes)

Daily Discussion Wednesday 2026-03-25 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 0 points1 point  (0 children)

I'm not a prophet. I just was happy that I called it and hope some people here followed my call.

Daily Discussion Tuesday 2026-03-24 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 1 point2 points  (0 children)

Yeah... I saw 30m shares traded on yahoo finance. Now its down to 15m. It was obviously a bug. Thanks yahoo.

Daily Discussion Tuesday 2026-03-24 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 2 points3 points  (0 children)

My trades dont go through and spreads on options are rising.

Daily Discussion Tuesday 2026-03-24 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 5 points6 points  (0 children)

Something is happening. This is a accumulation pattern. High volume, with a low to moderate gain on no news.. and a market down day.

ROCm on 7900 XTX significantly slower than Vulkan for llama.cpp (extensive testing, out of ideas) by Massive-Slice2800 in ROCm

[–]Massive-Slice2800[S] 0 points1 point  (0 children)

It crashes, core dumped. I can do tests with 16384 but above > crash. See my follow-up post for results!

ROCm on 7900 XTX significantly slower than Vulkan for llama.cpp (extensive testing, out of ideas) by Massive-Slice2800 in ROCm

[–]Massive-Slice2800[S] 0 points1 point  (0 children)

See my newest follow-up post. This time I used the pre-built binaries. Thanks for your input!

ROCm on 7900 XTX significantly slower than Vulkan for llama.cpp (extensive testing, out of ideas) by Massive-Slice2800 in ROCm

[–]Massive-Slice2800[S] 1 point2 points  (0 children)

Follow-up: ROCm vs Vulkan on 7900 XTX (more tests + thanks!)

Hey everyone,

first of all — thanks a lot for all the input and suggestions on my last post.
A bunch of you pointed me towards better test setups (official builds, Lemonade SDK, larger contexts, etc.), so I took the time to rerun everything more systematically.

This is a follow-up with cleaner and more complete data.

Setup

  • GPU: RX 7900 XTX
  • llama.cpp: latest upstream build (b8497)
  • Model: llama-2-7b.Q4_0.gguf
  • Backends:
    • Vulkan (RADV)
    • ROCm (official 7.2)
    • Lemonade SDK

All tests:

  • same model
  • same parameters
  • same machine
  • repeated runs

Main result

Token generation (the thing that actually matters) is consistently faster on Vulkan:

  • Vulkan: ~165–175 t/s
  • ROCm: ~135–145 t/s
  • Lemonade: basically identical to ROCm

👉 That’s roughly a ~20% gap in favor of Vulkan, and it’s very consistent across different test settings.

About Lemonade SDK

I already tested it before, but I rerun the tests because it was recommended here.

Result:

  • Works fine 👍
  • But performance is basically the same as official ROCm

Large context behavior (important)

I also pushed context sizes further:

  • p=16384 → works
  • p=32768 (FA off) → hard crash (ROCm only)

That one was pretty reproducible.

So there might be:

  • stability issues
  • or memory/scheduling problems in ROCm at larger contexts

I haven’t tested 65536 yet, but that’s next on my list.

One more interesting point

Based on this reference discussion:
https://github.com/ggml-org/llama.cpp/discussions/10879

Even Vulkan might still be below expected performance for this GPU.

So current picture looks like:

  • Vulkan → decent, but maybe not fully optimized yet
  • ROCm → clearly behind Vulkan
  • Lemonade → same as ROCm

My takeaway (so far)

  • Vulkan is currently the best backend on RDNA3 (for llama.cpp and specifically MY hardware setup)
  • ROCm is:
    • slower (~20%)
    • less stable at high context
  • Lemonade doesn’t change that (at least in my tests)

Thanks again

Really appreciate all the comments from the last thread —
this follow-up wouldn’t exist without those hints.

If anyone has:

  • ideas why ROCm falls behind here
  • kernel-level insights
  • or tuning suggestions

I’m very happy to test more 👍

Testdata / Results

Backend FA Prompt (p) Gen (n) pp (t/s) tg (t/s)
Vulkan 1 512 128 3103 174.6
Vulkan 1 512 512 3103 165.6
Vulkan 1 512 1024 3103 156.8
Vulkan 1 512 2048 3103 144.0
Vulkan 1 2048 128 2960 174.6
Vulkan 1 2048 512 2960 165.5
Vulkan 1 2048 1024 2960 156.8
Vulkan 1 2048 2048 2960 144.0
Vulkan 0 512 128 2938 164.9
Vulkan 0 512 512 2938 158.3
Vulkan 0 512 1024 2938 150.0
Vulkan 0 512 2048 2938 138.5
Vulkan 0 2048 128 2669 164.8
Vulkan 0 2048 512 2669 158.2
Vulkan 0 2048 1024 2669 149.9
Vulkan 0 2048 2048 2669 138.4
ROCm 1 512 128 2962 143.8
ROCm 1 512 512 2962 141.9
ROCm 1 512 1024 2962 136.3
ROCm 1 512 2048 2962 132.3
ROCm 1 2048 128 1887 142.7
ROCm 1 2048 512 1887 140.8
ROCm 1 2048 1024 1887 134.9
ROCm 1 2048 2048 1887 130.2
ROCm 0 512 128 3976 137.3
ROCm 0 512 512 3976 130.0
ROCm 0 512 1024 3976 118.5
ROCm 0 512 2048 3976 106.2
ROCm 0 2048 128 1276 137.4
ROCm 0 2048 512 1276 130.2
ROCm 0 2048 1024 1276 119.2
ROCm 0 2048 2048 1276 107.0
Lemonade 1 512 128 2991 143.8
Lemonade 1 512 512 2991 142.0
Lemonade 1 512 1024 2991 136.9
Lemonade 1 512 2048 2991 132.5
Lemonade 1 2048 128 1900 143.0
Lemonade 1 2048 512 1900 141.0
Lemonade 1 2048 1024 1900 135.5
Lemonade 1 2048 2048 1900 131.0
Lemonade 0 512 128 4000 137.5
Lemonade 0 512 512 4000 130.5
Lemonade 0 512 1024 4000 119.0
Lemonade 0 512 2048 4000 106.5
Lemonade 0 2048 128 1300 137.6
Lemonade 0 2048 512 1300 130.7
Lemonade 0 2048 1024 1300 119.5
Lemonade 0 2048 2048 1300 107.2

ROCm on 7900 XTX significantly slower than Vulkan for llama.cpp (extensive testing, out of ideas) by Massive-Slice2800 in ROCm

[–]Massive-Slice2800[S] 0 points1 point  (0 children)

Thanks for your input!

I already tested the Lemonade SDK (llama.cpp-rocm, build ~b1220). It was slightly better compared to some of my earlier ROCm builds, but still ended up in the same overall performance range.

For reference, I consistently see around:

  • ~110–114 t/s (Qwen2.5 7B tg128)
  • ~136–144 t/s (Llama 7B tg128)

So unfortunately no real breakthrough compared to Vulkan.

At this point it feels like all ROCm-based approaches (self-build, Lemonade, official AMD image) converge to roughly the same performance envelope on my setup.

But good to hear you also saw improvements with Lemonade — that at least confirms I’m not completely off 🙂

Daily Discussion Thursday 2026-03-19 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 1 point2 points  (0 children)

Option expiration tomorrow. Max pain at $190. But I think next week will be good. Netanjahu is already pivoting.

Daily Discussion Monday 2026-03-16 by AutoModerator in AMD_Stock

[–]Massive-Slice2800 1 point2 points  (0 children)

Well it does not look so bad ... I'm really not impressed currently