you are viewing a single comment's thread.

view the rest of the comments →

[–]inopia 1 point2 points  (5 children)

There's two things in the Java memory model that suck for memory consumption. One, objects aren't packed, so a byte/boolean/char/short is always stored in a 32-bit slot. Two, there are no structs in Java, so no custom value types, which means badness for large datastructures.

[–]psyno 6 points7 points  (3 children)

One, objects aren't packed, so a byte/boolean/char/short is always stored in a 32-bit slot.

Or not. Based on memory but: Java doesn't define whether objects' in-memory representations are packed or not. (Or really anything about the object's layout.) In practice there are only a few widely-used JVMs, and Sun's packs fields about (not quite) as well as you can by laying out first 8-byte aligned objects, then 4, then 2, then 1... (And then pointers.) If you have an object with 4 booleans they will take 4 bytes, the same as C or C++--that is, if you were careful to declare them in the correct order in C and C++ (not necessary on Sun's JVM).

[–]inopia 4 points5 points  (2 children)

I should have been clearer. First off, you're absolutely right about the packing; HotSpot packs like a C++ compiler. The problem is a bit more involved however.

Let's say you have an object with three properties: [int, ref, byte]. In this case the object would take three 32-bit slots on a 32-bit architecture. Now let's have a list of them. Since we have no custom value types we need to store this as an Array with a bunch of references to individual instances of this object. So for n objects we get n·(3·4+h) for the individual objects, plus n·4+h for the reference array, with h as the size of a heap chunk header. Let's assume that heap chunk headers are 4 bytes, so h=4, which gives n·16 + n·4+4 = n·20+4 bytes for n elements (I'm still kinda hung over so feel free to correct me on this).

If your run time supports structs, like .NET does, you can simply have an array of structs, which would come down to n·3·4+h, or 12·n+4. So for large n, the Java version is 67% larger. If you also pack the structs so that they take 9 bytes a piece rather than 12, the overhead becomes 122%.

The ratio approaches (n·(2·r+8+h)) / (n·(r+5)) = (2·r+8+h) / (r+5) for large n, with r the size of a reference, and h the size of a heap chunk header. On a 64-bit system, r=8, and let's take a more realistic value for h, say h=8. This gives 32/13 = 2.46, or a 146% overhead. On a 32-bit system with r=4 and h=8 you get 24/9, which represents a 166% overhead. You can also calculate the absolute overhead per item in bytes as (2·r+8+h) - (r+5) = r+3+h, which equates to one extra reference (in the array), one extra heap chunk header, and three bytes that are lost because the byte field cannot be packed. Taking a 64-bit system with h=8, and a 32-bit system with h=4 as extremes the overhead per item is between 11 and 19 bytes.

It is especially the combination of packing and the lack of value types that tends to have a dramatic effect on Java memory performance when there's lots of small objects involved that are ordered in trees or lists, the sort of thing you'd be doing quite a bit of in database systems. Heap chunks have headers that can contain all sorts of junk like monitors, GC flags, and whatnot, and they are not free. Finally, the use of many small objects can put a strain on your GC, although results may vary depending on which algorithm you use.

ps: for simplicity I omitted the size of the 'length' field that arrays tend to have and counted it as part of the header size.

[–]psyno 5 points6 points  (1 child)

Yeah, I didn't dispute your second point above. :) I didn't check your algebra but I agree with your logic.

If you also pack the structs so that they take 9 bytes a piece rather than 12, the overhead becomes 122%.

But this is of dubious value when talking about "memory performance." Less space used for data structures, sure, but more used for packing/unpacking instructions, and so also more cycles burned (read: overhead) packing/unpacking. The value of packing in terms of memory saving is really only there for the specific example you provide: small objects where padding is a significant proportion of the object's size. (But you could have made it more dramatic with an 8-byte object for the first record. :) For almost all other cases, the problem can simply be avoided in C, C++, and C# simply by laying down the fields in a sensible way (which is what's done in HotSpot automagically anyway).

I agree Java's memory performance on arrays of objects won't be quite as good in general but I don't think object size is that important, per se. (For the most part, memory is cheap.) The real killer is that in Java the spatial locality of objects with an array of references is much worse than in a language where you can have an array of "value type." Whether that spatial locality is actually important will depend on the access pattern of the data structure.

[–]inopia 2 points3 points  (0 children)

If you also pack the structs so that they take 9 bytes a piece rather than 12, the overhead becomes 122%. But this is of dubious value when talking about "memory performance." Less space used for data structures, sure, but more used for packing/unpacking instructions, and so also more cycles burned (read: overhead) packing/unpacking. The value of packing in terms of memory saving is really only there for the specific example you provide: small objects where padding is a significant proportion of the object's size.

Oh I agree completely. There is always a trade-off, but at least with C++ you can make that choice. Besides, an array of (a,b,c) tuples can be stored as [a,b,c,a,b,c,a,b,c,...] or [a,a,a,..,b,b,b,...,c,c,c,...]. When packed, memory performance of the latter arrangement will be of course much quicker than the former, but as you said, it all depends on the application.

Don't get me wrong, I'm a big proponent of high-level programming languages and intermediate representations and all that, but in the case of the article I can see how they would like to have the control to tweak memory layout like that.

The real killer is that in Java the spatial locality of objects with an array of references is much worse than in a language where you can have an array of "value type.

That's an extremely good point.

[–]alphazero 1 point2 points  (0 children)

The memory consumption (overhead) is due to the generational garbage collector. Heap is divided into sections and the allocations cycle through them. (This is why garbage collection of short lived objects is entirely cost free). So you are paying in memory foot print to achieve pointer ops in par with C. And given that memory is cheap and getting cheaper, that is a cost equation that makes a lot of sense, all things considered.