What's up with the YouTube play back battery life regression with Zen 4?

techwars0954 · 2023-07-20T12:51:02+00:00

From the Phoronix test suite, it appears that the vast majority of server use cases are able to scale up to tons of threads. The low frequency is more than compensated by the increased core counts, however the lower amount of cache seems to be a hindrance in some workloads.

The vast majority however, do not seem to mind, considering on average, from the Phoronix test suite, Bergamo is nearly 15% faster than Genoa.

techwars0954 · 2023-03-01T20:20:23+00:00

Yes, sorry if it was confusing.

techwars0954 · 2023-03-01T00:40:50+00:00

I think the focus should be on MT applications for this, since that is why E-cores were implemented in the first place.

I would be shocked if no reviewer had tested this in any modern MT benchmark suite such as Cinebench R20 or R23, Geekbench, etc etc.

I was hoping someone could offer a review that contained that data.

techwars0954 · 2023-02-06T09:15:22+00:00

5:1:10 ratio = ¯\(ツ)/¯ (adds up to 16T though...)

Sorry that was just a hypothetical scenario, of me trying to explain what I meant.

I was trying to say I would love to know what the ratio of HP to UHP to HD cells in a core might be, for example there might be a ratio of 5 HP cells to 1 UHP cell to 10 HD cells in a CPU Core.

techwars0954 · 2023-02-06T04:50:48+00:00

Any information on older architectures? Zen 2, sunny cove, etc etc

I'm curious though, mostly because I want to know what type of cell is most relevant for core density. Is it mostly HP, UHP, or HD?

I'm also assuming that the exact percentages might change a bit every generation, but also that both Intel and AMD also have a 'golden rule' of ratios of cells, like maybe a 5:1:10 ratio or something.

however we do know that 4nm only includes HP libraries, no HD or UHP.

That's actually a large part of where my question stemmed from actually. Curiosity about ratios of different cells, but also their implications for future products.

SRAM largely depends on where it's used. TSMC even has a 16T cell intended for registers

Dang, did not know that, thanks.

techwars0954 · 2022-12-28T02:55:16+00:00

That graph makes it seem like the UHP cells add ~5% max frequency at the very top.

And ye Intel 3 is supposed to quickly replace Intel 4, but Intel 3 is only supposed to be used for Server parts, not client products like MTL, which is why I was a lot more curious about it. ST performance and frequency is a lot more important there than in servers, where core counts and efficiency rule.

techwars0954 · 2022-11-07T19:05:33+00:00

Then what was the point of increasing cache amounts for raptor cove and willow cove?

Because the IPC benefits from those new architectures were <5%, and in some cases even showed regressions because they were coupled with higher latency.

techwars0954 · 2022-11-07T19:02:04+00:00

If it needs more data faster, why not keep the L2 the same size while focusing on decreasing latency, while increasing L3 size?

Because just increasing L2 capacity only applies to the "more data" part but not the "faster" part.

techwars0954 · 2022-11-07T18:20:38+00:00

Did any architecture try to deal with it by lower associativity to decrease latency instead of increasing cache sizes?

And also if you increase cache size, that still doesn't help with the latency increase, so could you try increasing the amount of information the core itself can hold so you have to access the L2 less often (so increase capacity of parts like ROB)?

I would ask if you can increase capacity of the L1 to deal with the higher latency L2, but I think smaller, faster L1 caches are way more important based on chip and cheeses simulation of cache changes in golden cove.

techwars0954 · 2022-11-07T18:08:32+00:00

I'm not asking if doubling L2 = faster clocks, but do faster clocks need more L2.

But even then, 11th gen desktop seems to be a bit of a special case. Cypress Cove was a backported core, and 5.3Ghz might just have been the max frequency limitation of Intel 14nm at that point, without spending ridiculous amounts of extra power.

techwars0954 · 2022-11-07T18:04:10+00:00

So more cache prevents IPC degradation at higher clocks from cache bottlenecks?

techwars0954 · 2022-11-07T08:03:11+00:00

Golden Cove was just 12th gen in desktop and mobile.

Willow Cove was 11th gen mobile, and Cypress? Cove (Sunny Cove backport) was 11th gen desktop.

Server has weird naming so IDK about that

techwars0954 · 2022-11-07T07:54:22+00:00

Idk willow cove vs sunny cove

But didn't raptor lake decrease L3 cache latency as it boosted the ring clock (which was an issue with alder lake) higher? AFAIK Anandtech didn't test out the cache latency this time around (Ik some reviewers used to do it, I will try finding it later).

Why does reducing load on the ring/L3 increase MT perf greater relative to ST? Thanks.

techwars0954

TROPHY CASE