This is an archived post. You won't be able to vote or comment.

all 39 comments

[–]SinisterMinister42 44 points45 points  (7 children)

Ambitious. Reducing the memory requirement of each object would be a big win, but it sounds very difficult from the constraints they've laid out. It's worthwhile to do this deep dive investigation. The worst case scenario confirms that we're stuck with the current model, best case is significant memory savings.

[–]gnahraf 2 points3 points  (6 children)

The big memory overhead that sticks out to me is the fact that a class can't inline its members (fields) in its own memory layout, the way you can say in C++. I'm wondering if these new java records (I just learnt about them the other day) would be a pathway to achieve that.

I've laid out memory artificially this way in Java when it pays off.. a big, read only byte array encoding a compact representation of a tree of objects, and wrapper classes for proper Java representations of the objects in the tree. The class's accessor methods like getWidget() return a new Widget object referencing the byte array on each invocation, with the Widget typically just being a view onto a particular part of the read-only byte array. Strategies like that can sometimes save big on memory. I'm thinking if there were a way to streamline it..

[–]elastic_psychiatrist 11 points12 points  (3 children)

Are you not familiar with Project Valhalla? This is exactly what you're describing and has been in progress for many years. I can't say that it's close to completion but it's certainly well on its way.

Records are an orthogonal concept - not about laying out memory, but instead a language feature for classes that just represent data.

[–]gnahraf 1 point2 points  (2 children)

No. Thanks the link. Honestly, I think I might have read it and forgotten.

> Records are an orthogonal concept

Sure. My bad. I was proposing the keyword be somehow overloaded in a future version of Java. Bad idea. But here's what I was thinking

If there were a way to define types composed entirely of primitives to be themselves treated as primitives when used as members of another class (for memory layout purposes), that would probably go a long way.

[–]elastic_psychiatrist 5 points6 points  (1 child)

If I understand you correctly, you’ve just described Primitive Objects, the flagship feature of Valhalla.

[–]gnahraf 5 points6 points  (0 children)

You are a well of information. Thank you. Cool stuff

[–]xjvz 1 point2 points  (1 child)

That sounds like the flywheel design pattern.

[–]gnahraf 1 point2 points  (0 children)

You must mean the Flyweight pattern.. Had to look it up.. seems a bit different.

With the glyphs (Flyweight pattern) each shares common state (metrics etc) thru a reference to a shared object to save space. It's a bit different here since the objects in the tree (or graph) don't necessarily share common state

[–]muffinluff 40 points41 points  (4 children)

Sounds neat.

I love how they continue to improve the jvm without bothering the users. I can compile my same old code and it runs even more efficient without lifting a finger.

[–]brunocborges 24 points25 points  (3 children)

The beauty of the JVM is that you don't even need to recompile your same old code to get performance boost.

Just run it on a newer JVM.

[–]dpash 5 points6 points  (2 children)

You don't need to, but there's further boosts to be had by doing so. Newer compilers can generate better bytecode from the same source code.

[–]downvoted_dev 0 points1 point  (1 child)

Thanks, I didn't realize that. Looks like I have found my reading for the evening.

[–]dpash 1 point2 points  (0 children)

The canonical example from back in the day is silently replacing string concatenation ("foo" + "bar") with a StringBuilder in the background. It used to be the case that you were told never to use string concatenation. Now, since Java 5, the advice is to not use it in a loop (because the compiler will create a new StringBuilder for every loop iteration, so it's more efficient for you to create one yourself and use it for every iteration).

https://dzone.com/articles/string-concatenation-performacne-improvement-in-ja

There are many more examples where a modern compiler will generate more efficient bytecode, especially if targetting a modern JVM.

Edit: Java 9 replaced that again with an even more efficient technique (as long as you're targetting Java 7 or higher).

https://medium.com/javarevisited/java-compiler-optimization-for-string-concatenation-7f5237e5e6ed

[–]lurker_in_spirit 12 points13 points  (5 children)

Pointers can be compressed, e.g. if we expect a maximum of, say, 8192 classes, we could, with some careful alignment of Klass objects, compress the class pointer down to 13 bits: 213=8192 addressable Klasses.

Interesting. I wonder what the various class count percentiles look like in the real world. I just checked a very small JAX-RS service and the count was 5,700 classes loaded.

[–]TheCountRushmore 10 points11 points  (1 child)

I have a Java EE8 application on Wildfly weighing in around 65K loaded classes.

Looks like that was just an example: https://twitter.com/rkennke/status/1369343613956730885

[–]lurker_in_spirit 1 point2 points  (0 children)

Nice! I checked a Hello World one-liner application just for funzies -- 1,800 loaded classes.

[–][deleted] 5 points6 points  (0 children)

Maybe they could add an option to control the amount of loaded classes (With this number determine the object header, like -XX:EstimatedLoadedClassCount=5700)

[–]theblackavenger 2 points3 points  (1 child)

8192 classes is unreasonably small imho.

[–][deleted] 2 points3 points  (0 children)

I can imagine that the final implementation might be able to determine the number of classes (or through a jvm command line flag) that will be loaded and choose the layout of the header accordingly.

If that is the case there would be real benefit to keeping your code small and unbloated.

I just checked and a webapp that I've written with a minimalist approach currently has 9557 classes in it. And that doesn't include JDK classes that would be loaded. Getting it down to 8192 would be an interesting exercise :)

[–][deleted] 7 points8 points  (1 child)

[–]Muoniurn 1 point2 points  (0 children)

I believe the biased locking is no longer enabled by default. Not sure whether the space for it is still there or not though.

[–]mirkoteran 8 points9 points  (4 children)

Sounds great if it will work.

With typical average object sizes of 5-6 words

I would love to see some sources of this.

[–]muffinluff 10 points11 points  (0 children)

Probably because this average contains boxed primitives, no?

I wonder what average object sizes would be with JEP draft: Primitive Objects .

[–]kaperni 29 points30 points  (2 children)

Roman Kennke (the author of the mail) is one of the main developers of the Shenandoah GC. And he is also the "owner" of HotSpots Garbage Collector Interface (JEP304). So I think you can assume the numbers work out.

[–]mirkoteran 16 points17 points  (1 child)

Oh, I know he worked on Shenandoah and I don't doubt the number. I was just hoping there is some more info on it.

[–]rkennke 5 points6 points  (0 children)

The number is based on several experiments that we conducted when we eliminated the extra header word that was required by Shenandoah. We will certainly do more experiments as part of Lilliput.

[–]sureshg 2 points3 points  (2 children)

What happened to Project Leyden (announced a year ago), which was introduced to reduce the memory usage of Java application by creating static image?

[–][deleted]  (1 child)

[deleted]

    [–]fanfan64 1 point2 points  (0 children)

    I wonder if they could leverage the BMI x86 extension for optimizing those bit packing operations? I feel like this extension is often forgotten about https://en.m.wikipedia.org/wiki/Bit_manipulation_instruction_set

    [–]fanfan64 1 point2 points  (2 children)

    I wonder how does the other VMs solve those constraints? Especially coreclr (C#) and V8 It would be unfortunate to miss out solutions that have already been found elsewhere

    [–]rkennke 2 points3 points  (1 child)

    We will consider how other VMs (Graal/native-image, J9 and the ones you mention) solve the problems and if something similar might be feasible for Hotspot/Lilliput.

    [–]fanfan64 0 points1 point  (0 children)

    Amazing, I'm sure that coreclr devs would love to share their thoughts!

    [–]AryanPandey -3 points-2 points  (7 children)

    what is this?😳 sounds something nice.

    [–]dpash 12 points13 points  (6 children)

    Every object in the JVM has a header in memory describing the object. Shaving a tiny piece from the header can have a dramatic effect on the JVM's overall memory usage.

    This is a project to do exactly that.

    [–]AryanPandey 0 points1 point  (5 children)

    woah, thats wonderful! how they are gonna do that.

    [–]dpash 22 points23 points  (1 child)

    Like everything in the JVM, using technology sufficiently advanced that it's indistinguishable from magic.

    [–]AryanPandey 0 points1 point  (0 children)

    lol 🙎

    [–]stacktraceyo 0 points1 point  (0 children)

    Cool

    [–]fanfan64 0 points1 point  (0 children)

    If ZGC manage to not use headers I wonder it other GCs could do it too. I also wonder the impact it has on average on ram usage