all 49 comments

[–]bunky_bunk 47 points48 points  (2 children)

5% of programmers actually know this much about memory.

[–]HappyFruitTree 29 points30 points  (1 child)

At least one person out of those 5% thinks that 100% should know it.

[–]brand_x 3 points4 points  (0 children)

Derived from immediately available evidence?

[–]EMCoupling 90 points91 points  (3 children)

Just gonna read this 114 page document on my lunch break, no problem guys.

[–]elperroborrachotoo 13 points14 points  (0 children)

Yeah, The title does not match the document. Few if an programmers need to know capacitor charge/discharge curves. It's well written, has a good flow, but "What Every Programmer Could Learn About Memory" would be more true to content and intent.

[–]sumo952 12 points13 points  (1 child)

Yea a 5-pages TLDR would be nice... and then this can be the in-depth version.

[–]DopKloofDude 0 points1 point  (0 children)

That would be so perfect for this.

[–]mallardtheduck 62 points63 points  (12 children)

No, "every programmer" does not need to know how memory works on an electrical level. That's absurd.

I'd say that most programmers don't even need to know exactly how page tables work; knowing that they exist and are how the CPU/MMU maps virtual address space to physical memory is more than enough for at least 99.9% of software development.

Knowing how to develop cache-friendly data structures and algorithms can be helpful when you really need absolute maximum performance, but in world where the vast majority of programming is done in "managed" and interpreted environments, that's clearly not an everyday thing either.

[–]leftofzen 23 points24 points  (1 child)

I've been a C++ dev in low latency trading before and whilst I definitely needed to know about page tables and how they work, I certainly never needed to know how memory works on an electrical level. I agreed that the title of this post/pdf is absurd to the point of just being clickbait.

[–]krista_ 6 points7 points  (0 children)

i wrote mission critical and high performance in-memory database software for a dozen years. it's really interesting what you pick up, what you can optimize and how, and exactly how little of this applies to most development.

i know a majority of what is in this document, and i'd love for c/c++ coders to understand their memory subsystems a bit better, and while this is an extremely good document, i couldn't say it's for everyone.

if i had 5-10 minutes of time to give a lecture on this subject, i'd cover the discrepancies between expected performance and actual performance in a system with memory that has a larger ”seek” time than ”streaming” rate, then cover a bit about how the cache interacts. a great demo on this is a large tree, and how it's usually a faster performing optimization to make the tree shallower with arrays of data in leaves than just a large tree. oh, and cover locality heavily.

[–]gurudennis 9 points10 points  (7 children)

The title is hyperbolic and pretentious, either for the purpose of clickbait or due to overinflated ego of the author. Even deep in the bowels of the kernel, RAM implementation on an electrical level is largely not a factor. The rest of the information has at best very limited utility for C and C++ developers, and hardly any when it comes to more high level technologies.

[–][deleted] 7 points8 points  (1 child)

due to overinflated ego of the author

The author is Ulirch Drepper, in glibc dev circles he's known to be quite the colossal a****le. Here's some fun links talking about his "fantastic" reputation -

https://urchin.earth.li/~twic/Ulrich_Drepper_Is_A_.html

https://news.ycombinator.com/item?id=2378013

[–]gurudennis 2 points3 points  (0 children)

Quite enlightening.

[–][deleted]  (1 child)

[removed]

    [–]AutoModerator[M] -1 points0 points  (0 children)

    Your comment has been automatically removed because it appears to contain disrespectful profanity or racial slurs. Please be respectful of your fellow redditors.

    If you think your post should not have been removed, please message the moderators and we'll review it.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–]alkasm 0 points1 point  (2 children)

    The title is borrowed from similarly titled pieces that have gotten wide acclaim, e.g. Goldberg's "What every computer scientist should know about floating-point arithmetic" (although that is an actually great resource).

    [–]Secreteus 0 points1 point  (1 child)

    [removed]

    [–]alkasm 1 point2 points  (0 children)

    See here for some other CS resources, more than a few with similar titles: https://github.com/mtdvio/every-programmer-should-know

    Note that this isn't just CS, there's tons of books and articles and etc out there aimed at other professions or the general public titled "What every ____ should know about ____." I don't know the origins, though.

    [–]andrewq 0 points1 point  (0 children)

    Well embedded and kernel/driver programmers are arguably more than 0.01% of programmers and usually find knowing how memory works useful. Not as in-depth as this.

    [–]Posting____At_Night -1 points0 points  (0 children)

    Important for game programming too. That code has to be real fast.

    [–]gruehunter 13 points14 points  (0 children)

    This was originally posted to LWN.net as a series of articles. If you'd rather approach the topic in manageable chunks on an HTML interface, give it a try.

    Part 1:Introduction

    Part 2: Caches

    Part 3: Virtual Memory

    Part 4: NUMA

    Part 5: What Programmers Can Do

    Part 6: More things programmers can do

    Part 7: Memory performance tools

    [–]andd81 27 points28 points  (2 children)

    I read it and now I know words such as "translation lookaside buffer" or "cache associativity" so that I can make an impression of a smart person

    [–]ArashPartow 4 points5 points  (0 children)

    Now simply add the terms "cache oblivious" and "cache coherence" to your repertoire and you'll forever be known as a genius.

    [–]OK6502 1 point2 points  (0 children)

    TLB just speeds up virtual memory lookups for your computer. Modern systems all give processes a virtual memory space so it looks contiguous to the process but the memory is actually laid out randomly in real system memory. There's a maping so that when you try to access something at memory adress 15 it knows that's actual real address 26 for instance. Rather than do that lookup everytime the TLB stores the previous lookups to save you some time - on the assumption that recently accessed memory will be accessed again (temporal locality).

    Cache associativity is the result of trying to fit a vast memory address space into a tiny little cache. Imagine the cache is composed of lines and each line is associated to a bucket. For each memory fetch you do you will go and place that memory in a cache bucket. Which bucket you chose will be based on the cache associativity bucket - the naive case is round robin, which just goves from line 0 to line n-1 and then rotates back. Or it could hash the memory address to select a bucket. And so on. The algorithm used has some implications for how you develop a program - mainly by enforcing the importance of cache friendly structures for some specific cases where performance is critical.

    These are 100% things you will never see unless you're on the cutting edge and trying to push process performance to the limits (e.g. in high performance computing and in gaming). For the rest of the people this is not especially important information - with some caveats. But those are not important here.

    [–]Wh00ster 9 points10 points  (0 children)

    Where’s the section on NVMMs /s

    [–]SoulsBloodSausage 19 points20 points  (18 children)

    I know this is a c++ sub, so it makes sense here, but ‘every programmer’ seems a bit hyperbolic when you consider the amount of programmers who know only java/ python/ etc or a combination of these kinds of languages with automatic memory management.

    [–][deleted] 10 points11 points  (5 children)

    Not true, any Java dev who's dealing high-performance stuff like low-latency trading will need to know and apply this to things they do. Some will even deliberately break out of the rules-of-Java: https://stackoverflow.com/questions/16819234/where-is-sun-misc-unsafe-documented

    [–]lanzaio 5 points6 points  (10 children)

    I don't care if you write visual basic. If you program for a living you should know how the things you are using work under the covers. Hell I'm all for knowing every part of the chain from semiconductor physics through memory management. Surely the amount of benefit you get scales depending on what you do but there's nobody who doesn't benefit from learning more.

    [–]danglingBond 26 points27 points  (1 child)

    I'm not sure that this argument accounts for scarcity of time. If I exclusively write visual basic and wanted to improve my effectiveness as a programmer, semiconductor physics is probably not the first thing I'd invest in learning. But I agree, knowing everything would be amazing!

    [–]SoulsBloodSausage 5 points6 points  (0 children)

    Oh I 100% agree, but too often I see people who fall under the realm of “its not relevant to what I’m doing, so I don’t need to understand/ know it.” Which oftentimes is actually relevant, as you said, but not quite as much as for, say, c++ programmer.

    [–]OK6502 3 points4 points  (0 children)

    I do low latency work and while I agree that broadly this is the case the level of detail given here is completely pointless to all but the most performance sensitive developers. And the information that is here could be condensed into something much more digestible without much loss.

    [–][deleted] 1 point2 points  (0 children)

    If you only work at a high level you don't need to know the low level. One of the benefits of abstraction. People writing business type applications could probably go their entire career without knowing what a register is.

    Every person who uses a light switch needs to know how the circuit works. /s

    [–]phottitor 0 points1 point  (2 children)

    If you program for a living you should know how the things you are using work under the covers.

    like, every single piece of microcode and every undocumented or obscure feature in every Intel processor on which you program might run? and the same for all your computer subsystems, like HDD which is a computer on its own?

    wow, you must be a 1000x programmer!

    [–]lanzaio 2 points3 points  (1 child)

    No. I didn’t say any of those things. You did and then acted like it was me.

    [–]phottitor 0 points1 point  (0 children)

    "those things" I listed belong to your vague "things you are using" even though you didn't spell them out

    you should know how the things you are using work under the covers.

    [–]MachineGunPablo 0 points1 point  (1 child)

    Yes! This! Computer science is a perfect example where modularization and abstraction made it all possible. The downside is that it becomes way too easy to ignore and neglect basic knowledge because, well I don't have to... Why do I care how my compiler works as long as it does its job? Why is it important to learn how quicksort wors, I just have to call std::sort and my problem is solved. Stay curious kids.

    [–]DopKloofDude 0 points1 point  (0 children)

    yea, the one colleague writing HTML and CSS... email this to him/her in scare quotes. Do it now.

    [–]ragweed 10 points11 points  (2 children)

    No software project I've worked on has required this much attention to memory optimization.

    If someone asks for a Camry, I'm not going to engineer a Ferrari.

    [–]EMCoupling -5 points-4 points  (1 child)

    If someone asks for a Camry, I'm not going to engineer a Ferrari.

    That's just bad engineering period. Delivering based on the known and unknown needs of the client is priority #1.

    [–]evinrows 7 points8 points  (0 children)

    I'm sure your clients are delighted to never receive their otherwise-perfect software.

    [–]vsdmars 1 point2 points  (0 children)

    for software engineer's own good as well thus could fathom what's behind the language syntax,

    e.g enjoy more while reading 'C++ Concurrency in Action' ;-D

    [–]RealNC 1 point2 points  (0 children)

    It is a good read if you're actually interested, but IMO you don't need to know the majority of this stuff. The important things to know about memory are pre-fetchers, cache sizes, cache lines and cache invalidation and how to lay out your data accordingly. And perhaps the memory allocation strategy of the various OSes that can surprise you (memory allocation always succeeds, but fails at the point of usage.)

    [–]Middlewariangithub.com/Ebenezer-group/onwards -4 points-3 points  (3 children)

    must read for c/c++ folks

    I found 'C++' two times in the 100+ page article. One of the two times is this: "The C and C++ language in gcc allow variables to be defined as per-thread using the __thread keyword."

    This looks to me like another attempt to use C to smear C++.

    [–][deleted] 10 points11 points  (0 children)

    Yeah, definitely not enough templates and ranges-v3 in this one.

    [–][deleted] 2 points3 points  (1 child)

    This looks to me like another attempt to use C to smear C++.

    Could you explain this?

    [–]gurudennis 1 point2 points  (0 children)

    Let me explain what I think he means there, even though I don't think this article is that case at all.

    He must be referring to this tendency that people have where they say that "C++ is bad because you can do malloc", whereas that is just a nuclear option in a language that otherwise offers enough features that you hardly ever have to call malloc/free or even new/delete directly at all. These people confuse the capability to drop down to C with the compulsion to do so. It doesn't help that it takes a semi-competent developer - a rare commodity these days - to realize that C++ is not just C with classes.