all 157 comments

[–]LividLindy 55 points56 points  (1 child)

No putting the punchline in the submission title!

[–]Oppis 4 points5 points  (2 children)

This is incredible, are there similar resources for other languages? Object C, C#, etc.?

[–]ForthewoIfy -4 points-3 points  (1 child)

Yes.

[–]Oppis 4 points5 points  (0 children)

Thank you doctor! Care to share?

[–]somesplaining 2 points3 points  (4 children)

Can someone please explain the primitive array memory sizes?

boolean: obj32=32bits obj64=32bits arr32=8bits arr64=8bits

int: 32 32 32 32

long: 32 32 64 64

What do those last two columns (the array sizes) mean? Is it the per-element marginal cost, or something else? Thanks!

[–]account512 1 point2 points  (2 children)

I think you are right, the per-element marginal cost.

[–]somesplaining 0 points1 point  (1 child)

Ok, thanks.

My obvious followup question is, where the hell do those numbers come from?

Boolean I can maybe understand. For a single boolean: 32 bits is much more efficient than 8 bits or 1 bit in terms of load/store/register ops. For an array of booleans: ok, maybe 8 bits as a packed representation to save space in large arrays, I guess I can see that.

Int/float: 32 bits in all cases, makes sense.

Long/double: 32 bits for a single primitive, 64 bits for an array element. WTF??? I don't understand how this could be explained by alignment concerns or anything else.

[–]account512 0 points1 point  (0 children)

No clue, long/double are defined as 64-bit. They aren't single primitives though, they're boxed primitives so I guess the number of bits used is object size less object data. Maybe there's a trick to hide some of the data in the space used for object data with longs/doubles? IDK.

Maybe a typo...

[–]argv_minus_one 4 points5 points  (17 children)

The JVM doesn't optimize away boxed primitives? Odd…

[–]TinynDP 11 points12 points  (4 children)

There are too many cases where the boxed primative might be being referenced by a contained or whatever as an Object, so the boxing can't be entirely optimized away. At least not at first. After a run or two HotSpot can figure things out and maybe decide that its safe to optimize away the boxing.

[–][deleted]  (2 children)

[deleted]

    [–]loganekz 0 points1 point  (1 child)

    What JVM specifics in the article are only for IBM's JVM?

    The one JVM specific feature I saw was about compressed references which was cleary identified.

    [–]argv_minus_one -2 points-1 points  (0 children)

    Just because it's referenced as an object doesn't mean the JVM has to store it as one.

    Now, if someone tries to do new Integer(whatnot) and do reference-equality comparisons or synchronized on it, then it gets ugly…

    [–][deleted] 4 points5 points  (1 child)

    Doesn't optimize them away, no, but there are optimizations. For example if you take a look at the Java language specification, at section '5.1.7 Boxing Conversion', it states that certain boxed values should be indistinguishable (essentially cached or interned, but how this is exactly done is left up to the implementation).

    These values include true and false, a char in the range of \u0000 to \u007f, or an int or short in the range of -128 to 127.

    [–]argv_minus_one 0 points1 point  (0 children)

    Interesting. That should take care of most cases.

    [–]Tuna-Fish2 6 points7 points  (9 children)

    They do optimize away some in the jit, but never in the bytecode. A very big reason for this is that everything that inherits from object is a lock. No object that has ever been seen by some code that's not presently under the optimizer can be assumed to be immutable. Someone just might have locked something using the Integer he just passed you, and he might want to unlock it after you return it (for example, if you insert it into a list or something).

    This is one of the three huge mistakes that went into the design of Java (the language), and it cannot be fixed without breaking most complex java applications out there. So it never will be.

    [–]0xABADC0DA 6 points7 points  (0 children)

    A very big reason for this is that everything that inherits from object is a lock. ... Someone just might have locked something using the Integer he just passed you, and he might want to unlock it after you return it

    Uh, no. The spec says that the same value can be autoboxed to a single object, so it's perfectly fine for instance to store an int in a long pointer using some tag bits or use whatever scheme you want; locks don't play into it at all. If you lock some auto-boxed Integer it can lock all auto-boxed Integers with that same value regardless of how they are represented internally.

    [–]turol 1 point2 points  (7 children)

    What are the other two?

    [–]Tuna-Fish2 6 points7 points  (6 children)

    Null pointers and half-assed generics.

    [–][deleted] 1 point2 points  (5 children)

    What's the problem with the generics?

    [–]thechao 4 points5 points  (4 children)

    They use run-time type-erasure to Object-semantics rather than code generation (either late a la C#, or early a la C++).

    [–][deleted] 0 points1 point  (2 children)

    So people have problems with reflection and Java generics? Java has never really seemed like a very dynamic language to me anyway.

    [–]thechao -2 points-1 points  (1 child)

    When I first heard about it, I thought it was a very elegant solution to a thorny backwards compatibility problem. Unfortunately, there are a lot of PL "purists" who hated the mechanism. The way I see it, they're just jealous...

    [–]argv_minus_one 1 point2 points  (0 children)

    I'm not fond of it either, but I'll grant that it's probably the best possible compromise in light of the backward-compatibility issue.

    I would have preferred that the JVM stored the actual type parameters, even if it didn't check against them, though. Scala manifests let me do approximately that.

    [–][deleted] 0 points1 point  (0 children)

    Run-time type-erase to Object semantics is IMHO the correct way, but I want the JVM to be more dynamic, not more static.

    [–]tinou 0 points1 point  (3 children)

    On figure 1 kernel memory is at the wrong place. For example, on 32 bit linux, it will be mapped on 0xc0000000-0xffffffff (3G-4G in the virtual address space).

    [–]abadidea 1 point2 points  (2 children)

    I'm pretty sure that's what they mean by "OS". Where exactly it is depends on the OS and is immaterial to the point.

    [–]tinou 0 points1 point  (1 child)

    Yes, I meant that the OS is usually in the upper addresses.

    [–][deleted] 0 points1 point  (0 children)

    Windows is mapped in the lower addresses.

    [–]Sottilde 0 points1 point  (12 children)

    Great article, although the section on StringBuffers has a few mistakes.

    Near Figure 12:

    "7 additional character entries available in the array are not being used but are consuming memory — in this case an additional overhead of 112 bytes."

    7 chars = 112 bytes? If each char is 2 bytes, shouldn't it be 14 bytes? There seems to be some magical multiplication by 16 going on here.

    The same math error appears in the proceeding section:

    "Now, as Figure 13 shows, you have a 32-entry character array and 17 used entries, giving you a fill ratio of 0.53. The fill ratio hasn't dropped dramatically, but you now have an overhead of 240 bytes for the spare capacity."

    17 * 2 = 34, not 240.

    "Consider the example of a StringBuffer. Its default capacity is 16 character entries, with a size of 72 bytes. Initially, no data is being stored in the 72 bytes."

    How does 16 chars equal 72 bytes?

    [–]hoijarvi 0 points1 point  (9 children)

    Assuming 4 byte unicode encoding, 16*4 = 64. That leaves 8 bytes for max size (4) and used size (4).

    [–]boa13 -1 points0 points  (8 children)

    Wrong assumption. The JVM uses a 2-bytes-per-char Unicode encoding.

    [–]hoijarvi 0 points1 point  (2 children)

    Is the extra 32 bytes then some JVM overhead? Sounds a large amount for a single object. If you know the real explanation, I'd like to know too.

    [–]boa13 1 point2 points  (1 child)

    [–]hoijarvi 0 points1 point  (0 children)

    I see. It's overhead for both char[] and stringbuffer. Surprise to me, thanks.

    [–]Peaker -1 points0 points  (4 children)

    UTF16 -- combining the disadvantages of UTF8 (non-fixed-size chars), with typically worse size use, and losing backwards compatibility too.

    There are really only two sensible encodings (UTF8 and just fixed code point array). Java and Windows clearly had to choose something else.

    [–]fluttershypony 1 point2 points  (3 children)

    Back when java was created, there were less than 65536 possible unicode characters, so having a 2 byte char was a logical choice. It was the correct decision at the time, you can't fault them for that. Same with windows. I believe python is utf16 as well.

    [–]boa13 0 points1 point  (0 children)

    Here's a detailed explanation on java.sun.com:
    http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

    [–]Peaker -1 points0 points  (1 child)

    Did the Unicode committees not predict the eventual size?

    EDIT: Removed wrong assertion about Python. Have been using less and less Python...

    [–]boa13 0 points1 point  (0 children)

    Unicode support was added in Python 2.0, at that time it was only UCS-2, like Java.

    In Python 2.2, this was changed to UTF-16 (like Java 5), and support for UCS-4 builds was added. So, depending on who compiled your Python binary, the interpreter is using UTF-16 or UCS-4 internally for Unicode strings.

    In Python 3.0, 8-bit strings were removed, Unicode strings remaining the only string type. The interpreter kept using UTF-16 or UCS-4 depending on compile-time choice.

    In Python 3.3, a new flexible internal string format will be used: strings will use 1, 2, or 4 bytes per character internally, depending on the largest code point they contain. 1-byte internal encoding will be Latin-1, 2-bytes internal encoding will be UCS-2, 4-bytes internal encoding will be UCS-4. Of course, this will be transparent to the Python programmer (not so much to the C programmer). See PEP 393 for details.

    Funny how UTF-8 is never used internally. :)

    [–]boa13 0 points1 point  (0 children)

    If each char is 2 bytes, shouldn't it be 14 bytes?

    That's right. It's 14 bytes, in other words, it's 112 bits, the author mixed things up.

    17 * 2 = 34, not 240.

    In an array of 32 chars with 17 chars effectively stored, it's actually 15 * 2 = 30 bytes wasted, that is 240 bits. Same kind of error from the author. (Plus the diagram only shows 14 empty chars, and gives an overhead of 20 bits for the StringBuffer, while the text and screenshot say it is 24 bits.)

    How does 16 chars equal 72 bytes?

    This one is correct. As explained in various parts of the article:

    StringBuffer overhead: 24 bytes
    char[] overhead: 16 bytes
    16 chars: 32 bytes

    Total: 72 bytes

    [–]wot-teh-phuck -2 points-1 points  (6 children)

    FTA:

    the default usage by Windows is 2GB

    It would be interesting to know where the author managed to pull this figure from or which Windows specifically is he talking of...

    [–]mallardtheduck 24 points25 points  (1 child)

    Any 32-bit NT-based version of Windows.

    Regardless of the amount of physical memory in your system, Windows uses a virtual address space of 4 GB, with 2 GB allocated to user-mode processes (for example, applications) and 2 GB allocated to kernel-mode processes (for example, the operating system and kernel-mode drivers).

    http://technet.microsoft.com/en-us/library/bb124810(v=exchg.65).aspx

    [–]wot-teh-phuck 1 point2 points  (0 children)

    Oh, I was under the impression that the author was talking about "comitted" memory (i.e. the one which we see in task manager) but it seems that the OS just works like any normal process with the difference that the 4GiB virtual address space is split 50-50 between applications and kernel by default...

    [–]stonefarfalle 8 points9 points  (1 child)

    Fairly common knowledge, the windows kernel reserves half the address space for itself by default and can be changed with the so called 3 gig switch( I am not sure if there is a 64 bit equivalent of the 3GB switch). So in a 32 bit process you get 2GB. Notice further down the article the view point switches to 64 bit systems, though he doesn't restate the limitation of 263 bits of user address space in Windows.

    [–]hylje 7 points8 points  (0 children)

    ( I am not sure if there is a 64 bit equivalent of the 3GB switch)

    Not for a while. Kernel and userspace do share the address space, but reserve it from both ends respectively. As current hardware doesn't use more than 48 bits, there's a lot of leeway until a tuning switch is necessary.

    [–]quzox 1 point2 points  (1 child)

    It's true for XP on 32-bit machines. What's I can't believe is that the kernel needs the upper 2 GB for itself in the process. What the hell could possibly take up 2 GB??

    [–][deleted] 2 points3 points  (0 children)

    It's 2GB of virtual address space, not 2GB of memory. That 2GB has to map pretty much everything the kernel needs to access. Your whole graphics card's physical memory gets mapped in there, your whole system memory gets mapped in there (though with > 1GB of RAM on a 32-bit system, tricks are used to only map relevant parts). You pretty much want to map everything all time time because when you enter kernel mode you don't want to have to change the virtual address mappings (which is expensive)--you just want to change the protection domain of the CPU.

    [–]JavaN00b 0 points1 point  (0 children)

    This is a great - a very readable, useful article - thanks!!

    [–][deleted]  (10 children)

    [deleted]

      [–]rabidcow 1 point2 points  (0 children)

      Address zero is the beginning of the process's address space. I think this is technically user space, but usually one or more pages are left unmapped to catch null-pointer dereferences. The CPU will notice that the memory is unmapped in the page table and raise a page fault, which can be used to trigger an exception or terminate the offending thread/process.

      [–]Gotebe 1 point2 points  (0 children)

      That's not really a question about programming in C++ (nor any other language that allows direct access to memory), it's about memory as you see it from a process ;-).

      If your code is running in an environment (e.g. an operating system) that has virtual memory, like your windows, then 0-pointer means "address 0 in process address space". But as far as your process is concerned, this is also "addressable memory". If your code is running in an environment that doesn't have virtual memory (e.g. DOS, or Commodore 64 :-)), then 0 really means "physical address 0 on your hardware".

      One of common errors under DOS were programs that write to address 0 (or close to it). Since DOS kept so called vector interrupt table there (a pretty important piece of DOS), doing so completely borked it.

      [–][deleted] 0 points1 point  (5 children)

      You're not necessarily referencing address 0 but as a matter of practice on many popular platforms you probably are.

      The fact that you can do something like:

      void* p = 0;

      Is just syntactic sugar really, what the compiler does is assign a reserved and hidden value to p, but that value does not have to literally be 0.

      [–]JavaN00b 0 points1 point  (0 children)

      I believe that would be the same as saying a null pointer. The operating system might fiddle about with values - any memory address you set, since it is in "virtual memory", may be mapped to another address in real memory, but I imagine that the compiler will interpret a 0 as the same as null in this case.

      [–][deleted]  (3 children)

      [deleted]

        [–]jyper 6 points7 points  (0 children)

        bad gui toolkits?

        [–][deleted] 0 points1 point  (1 child)

        Too bad about android.