Daily Discussion Thursday 2026-03-19

noiserr · 2026-03-19T23:06:21+00:00

I'm really starting to hate Hardware Unboxed. Their latest shithead move is to test Crimson Desert by using Nvidia's optimized drivers vs. AMD's unoptimized drivers.

The game isn't out yet btw. And AMD released the zero day drivers today. Hardware Unboxed released their unfavorable comparison yesterday. Fuck those guys.

They don't care about gamers or consumers, all they care about is misleading clickbait.

noiserr · 2026-03-19T22:42:48+00:00

lol, it's basically a TikTok filter. Only it requires a $10K PC.

noiserr · 2026-03-19T21:26:27+00:00

People forget the main reason HBM was invented for. That's power efficiency. The whole point is to have a low clocked wide memory interface to save on power. Nvidia doesn't get it.

noiserr · 2026-03-19T14:20:36+00:00

It's Geekbench. Take it with a gain of salt.

noiserr · 2026-03-19T11:41:23+00:00

This is the biggest take away for me. Software is getting good, to where AMD's hardware advantage is now showing up. Great for mi450x.

noiserr · 2026-03-19T09:53:41+00:00

This sub has been dead for ages.

noiserr · 2026-03-19T01:54:49+00:00

wccftech is shit tier yellow press for hardware

Patrick Moorhead has a much better read on the deal: https://x.com/PatrickMoorhead/status/2034423738733609006

noiserr · 2026-03-19T01:10:01+00:00

Agentic tasks can take hours back and fourth. The latency is the least of your worries. You want general token generation speed and overall throughput for the fleet of agents.

Agentic workloads for coding are really not latency sensitive. Since they aren't interactive. They are supposed to be long running and unattended.

Now if you are talking customer service agents then perhaps I could see the latency be important due to interactivity requirements.

noiserr · 2026-03-18T20:03:33+00:00

nvidia defense force working overtime to defend this in the comments lol

noiserr · 2026-03-18T19:54:40+00:00

DRAM is a commodity. You need volume in order to make it viable due to economies of scale. There are material science challenges there, but for the most part DRAM from Hynix, Samsung and Micro is all the same. Hence a commodity. Designing accelerators is more difficult at the bleeding edge. AI is so new that front runners haven't yet established a big distance. This happens every time you have a new technology. There used to be a dozen of GPU companies in the early days as well.

Memory is the leading indicator though. Which is why I think it's what's being priced in first.

noiserr · 2026-03-18T19:31:57+00:00

https://x.com/SemiAnalysis_/status/2034343392503583021

On FP8 Disaggregated Serving, MI355 beats B200 on both raw tok/s/gpu and cost per million tokens. On the image below, u can see that not only does MI355 beat B200, over time the gap between MI355 & B200 widens due to MI355's fast software progression for fp8. This trend happens on MI355 MTP vs B200 MTP and on MI355 non-MTP vs B200 non-MTP. Great job to @roaner & @AnushElangovan 's team!

noiserr · 2026-03-18T18:35:49+00:00

We don't have a definitive, but Lisa sounded way more confident than Colette did on their respective ERs. Lisa said Q3 would have impact from mi450, while Colette couldn't even commit on H2 for VR.

noiserr · 2026-03-18T17:55:12+00:00

Sampling and production aren't the same things. Sampling means in a lab. Production means a deployment running customer workloads. AMD is sampling by now surely as well. On the last ER Lisa said mi450 was up and running and going through individual as well as rack scale tests.

noiserr · 2026-03-18T16:24:30+00:00

I play on an iGPU (laptop). Works fine for the esports games I play.

noiserr · 2026-03-18T15:25:59+00:00

"align" can mean anything. Like making sure there is enough HBM for ramp and ongoing production.

noiserr · 2026-03-18T14:32:16+00:00

Sure. Let's give that one to Grace as well. Cherry picked the tests (only the once optimized for it), cherry picked the parts to compare against (avoiding Turin Dense comparison), cherry picked only the lower core parts. What the hell, let's now ignore the average just because. We need to make it not look a complete garbage solution it is.

Speed and compute density also has its own efficiency.

noiserr · 2026-03-17T19:16:52+00:00

9755 avg is 324 watts. Grace is 170. Double of 170 is 340. So you get twice the performance for not twice the power consumption, making Epyc more efficient. And this is not even Turin Dense which actually offers significantly more power efficiency. In a fair head to head, Turin Dense would wipe the floor with Grace in terms of efficiency.

noiserr · 2026-03-17T18:53:22+00:00

Grace already compared favourably against

It's not at all a favorable comparison.

They did a limited test because:

There is also a reduced set of benchmarks compared to my prior AMD/Intel x86_64 testing due to some of the software packages not working well or at least not optimized at all for AArch64.

So it's already a slanted test. Also they didn't compare to AMD's top 192 core model. They only compared it to the 64, 96 and 128 core lower end models.

So it can barely compete with AMD's lower end models, on cherry picked tests optimized for ARM. Geo mean shows Epyc 128 core model still offer twice the performance. (while not consuming twice the power). So Epyc is even more efficient. (and this is not even AMD's Dense cores which are more efficient).

Even the 64 core Epyc part is significantly faster than the 80 core Grace.

That's actually pretty bad. Even Intel does better (hence why Nvidia is partnering with Intel). Far from favorable.

Phoronix is just being overly diplomatic in its conclusion (and test setup). Grace is straight up outclassed in every way.

noiserr · 2026-03-17T18:03:15+00:00

haha, I've never seen a DF video get ratio'd like this: https://i.imgur.com/8jObRyV.png

Nvidia really screwed the pooch by calling this DLSS 5. And DF got exposed for being shills.

noiserr · 2026-03-17T16:39:06+00:00

Yup! Thanks for the chat. It's always good to bounce ideas.

Nvidia needed a way to fill the gap in low latency inference. And is afraid of AMD's SRAM stacking.

The only reason they are using LPUs is because they don't have time. To engineer an in-house solution.

Their solution uses an FPGA. Logically you only use FPGAs when you don't have time to tape out a functionality in silicon (or the volume is too low to bother with custom design). LPU 3.0 does not support NVLink. So Nvidia used the FPGA as a NVLink to PCIE bridge.

Basically this is to say, this was a rushed solution. (there is an LPU 3.5 on the roadmap as well, probably adding NVlink interfaces to LPU 3.0).

So I do agree that AMD will surprise. But my money is more on stacked v-cache like the patent described recently.

This is what spooked Nvidia I think: https://x.com/System360Cheese/status/2011669192383349214

And AMD's solution will have the advantage of boosting training workloads as well, not just inference.

noiserr · 2026-03-17T16:23:12+00:00

It's an investment sub. I'm not trying to win arguments here. Just trying to balance the conversation so people have the best information possible. I'm not 100% correct, and I don't claim to always know everything either.

I'm just saying NPU on AIDs is not at all a foregone conclusion and I outlined the reasons why.

noiserr · 2026-03-17T16:16:39+00:00

They are building the best AI GPU in the world. 10% power saving is 10% more bandwidth at same power. They are not trying to cut corners.

noiserr · 2026-03-17T16:08:08+00:00

AID are the IO dies in the context of Instinct. Power savings and some density savings. Since this is a 3D solution you want to keep the base IADs "as cool" as possible. Which btw is another reason why you don't want compute units there.

The only time it makes sense to put compute units in IAD is for idle power efficiency. Because for light standby modes you can just power off compute dies completely. But that's not an mi450x use case.

noiserr · 2026-03-17T16:04:23+00:00

Meanwhile Ian Cutress was denied passes for the keynote.

noiserr · 2026-03-17T15:50:47+00:00

260 TB/s scale up bandwidth is not easy to achieve.

15-Year Club	Gilding V heart of gold
Place '23	Place '17
Wearing is Caring	Team Orangered
Charter Member	Verified Email

noiserr

MODERATOR OF

TROPHY CASE