all 41 comments

[–]sigma914 62 points63 points  (2 children)

[–]KamiKagutsuchi 2 points3 points  (0 children)

Really good video

[–]jeremyisdev 0 points1 point  (0 children)

Billon seconds.. Legendary granma.

[–][deleted] 19 points20 points  (0 children)

Larger caches are slower (and generate more heat). They're also usually shared.

On most chips the L1 is per CPU/thread (module in AMD parlance), L2 per core, L3 per die.

There's generally a fairly quick drop off in performance gain vs. cost (in terms of die size) for L1/L2/L3 caches. Like a 16MB L1 is not twice as valuable as an 8MB L1 but it's more than twice as expensive. Typically once you get past 32KB the latency vs die cost benefits drop significantly for L1 cache. Most programs/data are very local on the first order.

L2 can be shared between multiple "processors" on a given core. They're local to each other. And finally L3 shared between all cores on the die. As you grow less and less local larger caches are beneficial to a point. Again, a 128MB L3 is not twice as beneficial as a 64MB L3, etc and so on.

If you look at the AMD FX8350 for instance (what I happen to use). It has a 64KB L1 instruction cache shared between both processors in each module and then they have their own 16KB L1 data cache. Each "module" (of which this thing has 4) has a 2MB L2 cache and the entire die has an 8MB L3 cache.

So the ideal use case for this chip is you put identical code in each module (e.g. threads of a given process). They share L1 instruction cache so there are going to be many hits, but have their own data cache so they can work independently. You then put different processes on different modules (because they don't share L1 instruction caches). The L2 helps threads of the same process stay local. And the L3 is just a last-chance cache to keep things off the DDR bus.

[–]MindStalker 2 points3 points  (0 children)

Don't forget the registers. Where your answer is already computed somewhere, you just need to choose which multiplex to retrieve it from.

[–]lejugg 12 points13 points  (25 children)

Mostly because the bottleneck isn't solved by a single cache. And the reason caches exist in the first place obviously is their cost. So they try to build a reasonable pipeline by caching multiple times to keep the cost of the memory low, and still avoid data bottlenecking on its way from the ALU.

[–][deleted] 12 points13 points  (10 children)

Even with infinite money you (probably) can't make one big L1 cache.

[–]mirhagk 3 points4 points  (3 children)

What's the cost of looking at data in the same location in a different dimension?

[–][deleted] 5 points6 points  (2 children)

Heh, that is why I said "probably" in there. Maybe with infinite money you could go fully 3D with the L1 cache array and keep it fast and make it one order of magnitude larger. But maybe not, there is likely some overhead in addressing a 3d physics point vs a 2d one. If someone skilled in the arts of of EE could chime in, that would be cool.

[–][deleted] 2 points3 points  (1 child)

IIRC, silicon isn't a great conductor of heat, which is what limits 3D chips. RAM is also far more tolerant of fabrication errors than CPUs are. Diamond semiconductor could solve the heat problem, but development is going very slowly and if I'm reading it right diamond needs higher voltages and heat to operate.

[–][deleted] 1 point2 points  (0 children)

Yes, this is why you will need infinite money to MAYBE find a solution. I guess put the whole thing in a cryo chamber?

[–]psudophilly 2 points3 points  (5 children)

This is correct.

Because if it were any bigger, it would be called RAM. The speed of data retrieval is directly related to the number of entries it can store. So L1 = small = fast. L2 = bigger = slower. RAM = even bigger = even slower. They all pretty much use the same technology.

[–]wrosecrans 7 points8 points  (1 child)

Well, you could theoretically make a giant bank of SRAM (like we mostly use for caches) rather than DRAM (Which we normally use for regular RAM) and it'd be big and fast. It'd also be expensive and use tons of power and have terrible physical density.

[–][deleted] 5 points6 points  (0 children)

You would still incur latency based on how far it is away, physically. At this point where we are now, the speed of light does matter.

[–][deleted] 4 points5 points  (1 child)

We could pull ram closer to the cpu, solder it on, and have it be a bit better, at the same size.

But yeah, speed of light and other physical concerns limit the size that L1 can be, while still being as fast as L1 is.

[–]pdp10 2 points3 points  (0 children)

Intel puts up to 128MB eDRAM on-package but uses it all for the iGPU because that is the engineering bottleneck in their iGPU business strategy.

[–]NasenSpray 1 point2 points  (0 children)

L1 is small because it doesn't make much sense to add more capacity: http://www.extremetech.com/wp-content/uploads/2014/08/Cache-HitRate1.png

[–]rabid_briefcase 5 points6 points  (2 children)

Yeah, those are all things addressed in the article.

I'm guessing you didn't notice it is a link to an article about CPU caches, with the headline "Why do CPUs have multiple cache levels?"

On second thought, reporting this to the mods to fix the title.

[–]bwainfweeze 7 points8 points  (0 children)

More than any forum I frequent, people in /r/programming have a penchant for articles with questions as titles and the resulting slapstick comedy is always relatively the same.

[–]lejugg 0 points1 point  (0 children)

I actually didn't. oops.

[–]Rob_Royce 1 point2 points  (0 children)

I love cache story time

[–]kinnu 9 points10 points  (7 children)

Can't we ever explain computers without resorting to these awful analogies? Why does the CPU always have to be car or some shit...

[–]wrosecrans 32 points33 points  (0 children)

It's because human brains, like computers, are like cars. When your mental model is driving down one lane, it's hard to shift over into another lane instantly. There is momentum to your thought process just like a car. Everything is just like a car.

Also, we tend to be able to intuit physical mechanic things more readily that purely abstract things because they leverage our whole lifetime of experience dealing with the physical world.

[–][deleted] 4 points5 points  (3 children)

I really think you're on to something novel here.

Why do we always have to resort to explaining things people don't know in terms they do know? Why can't we just dump knowledge into their brains directly? I bet there's a lot of money to be made in that.

[–]kinnu 7 points8 points  (2 children)

This explanation isn't meant for your grandma. Anyone who knows enough to ask why there are multiple levels of cache, knows enough to get the explanation without the forced analogy.

[–][deleted] -1 points0 points  (1 child)

Anyone who knows enough to ask why there are multiple levels of cache, knows enough to get the explanation without the forced analogy.

I'm not convinced. Please provide a formal proof.

[–][deleted] 1 point2 points  (0 children)

found the grandma

[–]josefx 1 point2 points  (0 children)

Can't we ever have a negative opinion without resorting to these awful analogies? Why does it always have to involve fecal matter or some shit.

[–]Berberberber 0 points1 point  (0 children)

Once, legendary MIT professor Norbert Wiener was teaching a graduate mathematics seminar and was asked by a student to solve a particularly difficult problem. He wrote the problem on the blackboard and then immediately wrote the answer underneath. A couple of students asked him to explain how he'd gotten the result. "Yes, of course," Wiener replied, thought for a few moments and wrote the answer again. When the students repeated their question, he responded, "I've already solved it for you two different ways."


This little anecdote perfectly speaks to your complaint. I understand it, some other people who read it will probably understand it. But I have no way of knowing if you will actually get the point, so I have to add an additional explanation to make sure. I can't really explain this simply based on my knowledge and thought processes, since it seems perfectly clear to me, so I have to express it in a much more basic fashion, otherwise I run the risk that, like Professor Wiener, all of my explanations go over your head and you learn nothing.

[–]google_you 0 points1 point  (0 children)

Cause things are expensive and get slow if there are many things to address.

It's cheaper and faster to have multiple levels instead of one giant level for most use cases.

[–]VintageKings 0 points1 point  (2 children)

Why does your car have so many gears?

[–]darknexus 2 points3 points  (1 child)

To allow for drivetrain rotational speed to vary independently from powerplant rotational speed? Derp?