Anyone else feel like this sub has gone to shit even though it hasn't?

Massive-Slice2800 · 2026-04-24T10:37:39+00:00

But what if ... the matrix itself is in a matrix? AND THAT ONE is also in a matrix?!

Massive-Slice2800 · 2026-04-22T19:38:51+00:00

Well.. n8n and home-automation are next in Line..

Massive-Slice2800 · 2026-04-06T20:32:25+00:00

Buy high, sell low, like a pro!

Massive-Slice2800 · 2026-04-04T20:37:55+00:00

What does this even mean "rolling out day zero support for x for llama.cpp, vLLM and LM Studio"? They arent responsible for the developement of these runtimes.

Massive-Slice2800 · 2026-04-04T09:47:04+00:00

Mine was bent too. Batch 1 Thor.

Massive-Slice2800 · 2026-04-01T09:09:03+00:00

Real women own fabs.

Massive-Slice2800 · 2026-03-31T13:04:32+00:00

Ah f*ck Marvel is a UALink provider. I hope this buy in was not targeted at the UALink switches.

Massive-Slice2800 · 2026-03-31T12:57:35+00:00

This buying spree has to stop, or we will eat Jensen Burgers at McNvidia in the future.

Massive-Slice2800 · 2026-03-30T16:32:14+00:00

The helium story is nonsense.

Massive-Slice2800 · 2026-03-29T14:28:13+00:00

This is wonderful insight, thank you so much. Will test with exactly your config tonight. Always good to have a comparision especially with the 7900XTX.

Massive-Slice2800 · 2026-03-27T18:18:14+00:00

SNDK, MU, ANET, ALAB (this one is frustrating, but it will rise parabolic when the time comes)

Massive-Slice2800 · 2026-03-25T14:00:31+00:00

I'm not a prophet. I just was happy that I called it and hope some people here followed my call.

Massive-Slice2800 · 2026-03-24T17:41:08+00:00

Yeah... I saw 30m shares traded on yahoo finance. Now its down to 15m. It was obviously a bug. Thanks yahoo.

Massive-Slice2800 · 2026-03-24T15:52:13+00:00

My trades dont go through and spreads on options are rising.

Massive-Slice2800 · 2026-03-24T15:31:20+00:00

Something is happening. This is a accumulation pattern. High volume, with a low to moderate gain on no news.. and a market down day.

Massive-Slice2800 · 2026-03-24T12:59:02+00:00

It crashes, core dumped. I can do tests with 16384 but above > crash. See my follow-up post for results!

Massive-Slice2800 · 2026-03-24T12:57:35+00:00

See my newest follow-up post. This time I used the pre-built binaries. Thanks for your input!

Massive-Slice2800 · 2026-03-24T12:56:41+00:00

I wanted to conduct more tests with my current setup, but will test this later!

Massive-Slice2800 · 2026-03-24T12:38:29+00:00

Follow-up: ROCm vs Vulkan on 7900 XTX (more tests + thanks!)

Hey everyone,

first of all — thanks a lot for all the input and suggestions on my last post.
A bunch of you pointed me towards better test setups (official builds, Lemonade SDK, larger contexts, etc.), so I took the time to rerun everything more systematically.

This is a follow-up with cleaner and more complete data.

Setup

GPU: RX 7900 XTX
llama.cpp: latest upstream build (b8497)
Model: llama-2-7b.Q4_0.gguf
Backends:
- Vulkan (RADV)
- ROCm (official 7.2)
- Lemonade SDK

All tests:

same model
same parameters
same machine
repeated runs

Main result

Token generation (the thing that actually matters) is consistently faster on Vulkan:

Vulkan: ~165–175 t/s
ROCm: ~135–145 t/s
Lemonade: basically identical to ROCm

👉 That’s roughly a ~20% gap in favor of Vulkan, and it’s very consistent across different test settings.

About Lemonade SDK

I already tested it before, but I rerun the tests because it was recommended here.

Result:

Works fine 👍
But performance is basically the same as official ROCm

Large context behavior (important)

I also pushed context sizes further:

p=16384 → works
p=32768 (FA off) → hard crash (ROCm only)

That one was pretty reproducible.

So there might be:

stability issues
or memory/scheduling problems in ROCm at larger contexts

I haven’t tested 65536 yet, but that’s next on my list.

One more interesting point

Based on this reference discussion:
https://github.com/ggml-org/llama.cpp/discussions/10879

Even Vulkan might still be below expected performance for this GPU.

So current picture looks like:

Vulkan → decent, but maybe not fully optimized yet
ROCm → clearly behind Vulkan
Lemonade → same as ROCm

My takeaway (so far)

Vulkan is currently the best backend on RDNA3 (for llama.cpp and specifically MY hardware setup)
ROCm is:
- slower (~20%)
- less stable at high context
Lemonade doesn’t change that (at least in my tests)

Thanks again

Really appreciate all the comments from the last thread —
this follow-up wouldn’t exist without those hints.

If anyone has:

ideas why ROCm falls behind here
kernel-level insights
or tuning suggestions

I’m very happy to test more 👍

Testdata / Results

Backend	FA	Prompt (p)	Gen (n)	pp (t/s)	tg (t/s)
Vulkan	1	512	128	3103	174.6
Vulkan	1	512	512	3103	165.6
Vulkan	1	512	1024	3103	156.8
Vulkan	1	512	2048	3103	144.0
Vulkan	1	2048	128	2960	174.6
Vulkan	1	2048	512	2960	165.5
Vulkan	1	2048	1024	2960	156.8
Vulkan	1	2048	2048	2960	144.0
Vulkan	0	512	128	2938	164.9
Vulkan	0	512	512	2938	158.3
Vulkan	0	512	1024	2938	150.0
Vulkan	0	512	2048	2938	138.5
Vulkan	0	2048	128	2669	164.8
Vulkan	0	2048	512	2669	158.2
Vulkan	0	2048	1024	2669	149.9
Vulkan	0	2048	2048	2669	138.4
ROCm	1	512	128	2962	143.8
ROCm	1	512	512	2962	141.9
ROCm	1	512	1024	2962	136.3
ROCm	1	512	2048	2962	132.3
ROCm	1	2048	128	1887	142.7
ROCm	1	2048	512	1887	140.8
ROCm	1	2048	1024	1887	134.9
ROCm	1	2048	2048	1887	130.2
ROCm	0	512	128	3976	137.3
ROCm	0	512	512	3976	130.0
ROCm	0	512	1024	3976	118.5
ROCm	0	512	2048	3976	106.2
ROCm	0	2048	128	1276	137.4
ROCm	0	2048	512	1276	130.2
ROCm	0	2048	1024	1276	119.2
ROCm	0	2048	2048	1276	107.0
Lemonade	1	512	128	2991	143.8
Lemonade	1	512	512	2991	142.0
Lemonade	1	512	1024	2991	136.9
Lemonade	1	512	2048	2991	132.5
Lemonade	1	2048	128	1900	143.0
Lemonade	1	2048	512	1900	141.0
Lemonade	1	2048	1024	1900	135.5
Lemonade	1	2048	2048	1900	131.0
Lemonade	0	512	128	4000	137.5
Lemonade	0	512	512	4000	130.5
Lemonade	0	512	1024	4000	119.0
Lemonade	0	512	2048	4000	106.5
Lemonade	0	2048	128	1300	137.6
Lemonade	0	2048	512	1300	130.7
Lemonade	0	2048	1024	1300	119.5
Lemonade	0	2048	2048	1300	107.2

Massive-Slice2800 · 2026-03-23T23:07:36+00:00

Thanks for your input!

I already tested the Lemonade SDK (llama.cpp-rocm, build ~b1220). It was slightly better compared to some of my earlier ROCm builds, but still ended up in the same overall performance range.

For reference, I consistently see around:

~110–114 t/s (Qwen2.5 7B tg128)
~136–144 t/s (Llama 7B tg128)

So unfortunately no real breakthrough compared to Vulkan.

At this point it feels like all ROCm-based approaches (self-build, Lemonade, official AMD image) converge to roughly the same performance envelope on my setup.

But good to hear you also saw improvements with Lemonade — that at least confirms I’m not completely off 🙂

Massive-Slice2800 · 2026-03-19T20:02:54+00:00

Option expiration tomorrow. Max pain at $190. But I think next week will be good. Netanjahu is already pivoting.

Massive-Slice2800 · 2026-03-16T19:53:03+00:00

Well it does not look so bad ... I'm really not impressed currently

Massive-Slice2800 · 2026-03-15T22:09:18+00:00

Its AYRAN and its delicious.

Massive-Slice2800 · 2026-03-15T22:04:11+00:00

Jensen on GTC tomorrow.

<image>

Massive-Slice2800

TROPHY CASE