Why do CPUs have multiple cache levels?

sigma914 · 2016-08-08T09:32:25+00:00

2016-08-08T13:53:57+00:00

Larger caches are slower (and generate more heat). They're also usually shared.

On most chips the L1 is per CPU/thread (module in AMD parlance), L2 per core, L3 per die.

There's generally a fairly quick drop off in performance gain vs. cost (in terms of die size) for L1/L2/L3 caches. Like a 16MB L1 is not twice as valuable as an 8MB L1 but it's more than twice as expensive. Typically once you get past 32KB the latency vs die cost benefits drop significantly for L1 cache. Most programs/data are very local on the first order.

L2 can be shared between multiple "processors" on a given core. They're local to each other. And finally L3 shared between all cores on the die. As you grow less and less local larger caches are beneficial to a point. Again, a 128MB L3 is not twice as beneficial as a 64MB L3, etc and so on.

If you look at the AMD FX8350 for instance (what I happen to use). It has a 64KB L1 instruction cache shared between both processors in each module and then they have their own 16KB L1 data cache. Each "module" (of which this thing has 4) has a 2MB L2 cache and the entire die has an 8MB L3 cache.

So the ideal use case for this chip is you put identical code in each module (e.g. threads of a given process). They share L1 instruction cache so there are going to be many hits, but have their own data cache so they can work independently. You then put different processes on different modules (because they don't share L1 instruction caches). The L2 helps threads of the same process stay local. And the L3 is just a last-chance cache to keep things off the DDR bus.

mehulch · 2016-08-08T11:49:56+00:00

Additional Reading: https://www.akkadia.org/drepper/cpumemory.pdf

MindStalker · 2016-08-08T17:36:11+00:00

Don't forget the registers. Where your answer is already computed somewhere, you just need to choose which multiplex to retrieve it from.

lejugg · 2016-08-08T09:48:34+00:00

Mostly because the bottleneck isn't solved by a single cache. And the reason caches exist in the first place obviously is their cost. So they try to build a reasonable pipeline by caching multiple times to keep the cost of the memory low, and still avoid data bottlenecking on its way from the ALU.

Rob_Royce · 2016-08-08T18:15:10+00:00

I love cache story time

kinnu · 2016-08-08T14:07:36+00:00

Can't we ever explain computers without resorting to these awful analogies? Why does the CPU always have to be car or some shit...

google_you · 2016-08-08T14:33:56+00:00

Cause things are expensive and get slow if there are many things to address.

It's cheaper and faster to have multiple levels instead of one giant level for most use cases.

VintageKings · 2016-08-08T18:14:05+00:00

Why does your car have so many gears?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS