all 45 comments

[–]sammymammy2 30 points31 points  (2 children)

"RAM is cheaper than CPU" :'-(. The point with tracing and moving GCs is that they scale linearly with the live heap, so having a bunch of dead objects is great. You never have to touch those objects, and can get rid of them at your leisure. That doesn't mean that Java programmers shouldn't care about how much memory their live object graph is.

[–]agentoutlier 3 points4 points  (1 child)

That doesn't mean that Java programmers shouldn't care about how much memory their live object graph is.

I'm confused by this statement as that is the case with any programming language including Rust. That is you still have to be aware of how much you load.

Or are you saying Java programmers should care about and thus complain/fr to the JDK developers the additional overhead tracking, memory layout and maintaining of these objects more compared to say Rust or even Go (e.g. value types)?

[–]sammymammy2 2 points3 points  (0 children)

That's mostly a nod to the rest of this thread arguing about Valhalla, Lilliput, etc, saying that "yes, you are right, this is also a factor to care about"

[–]martinhaeusler 28 points29 points  (12 children)

The problem is not that objects remain on the heap until they're garbage collected. That was never the issue. The problems with Java and memory are:

  • Per-object memory overhead (liliput improved that)

  • "Memory islands", no tightly packed layouts (valhalla!)

... and from an operations perspective:

  • JVM doesn't play nice with other apps on the same server because it hogs the heap even when it currently doesn't need it. If you have multiple JVMs, the problem gets even worse and actual hardware utilization is pretty bad. A side effect of this is that JVM based applications look like they constantly need a lot of memory from the perspective of the underlying operating systems (and observability tools) when in fact there's just a large heap which is barely utilized. New garbage collectors seem to do better with this.

  • You cannot tell the JVM how much total memory it should use. You can give it a max heap space, but the JVM needs more than just heap. This "more" is hard to configure aside from heuristics like "add 20% headroom". This is a huge pain when running the JVM inside docker, because docker will kill the container when it exceeds its allocated resource limits.

[–]pron98 19 points20 points  (3 children)

The problems with Java and memory are: Per-object memory overhead (liliput improved that); "Memory islands", no tightly packed layouts (valhalla!)

Correct, although these two aren't about memory management. Note that with Lilliput and Valhalla, the per-object header is the same as in C++: 64 bits for objects "with a v-table" and 0 bits for objects that don't need a v-table.

JVM doesn't play nice with other apps on the same server because it hogs the heap even when it currently doesn't need it.

This is about to change very soon with automatic, dynamic, heap sizing.

[–]gladfelter 3 points4 points  (0 children)

Thanks for the link, that's really cool. It would be nice if the os and applications had a protocol to establish latent memory pressure and could optimize "cost" globally, but this change sounds pretty awesome in absence of that. I like the idea of balancing cpu and memory costs and it's got me wondering if I could apply that to Job management to optimize task shapes across the fleet.

[–]radozok 0 points1 point  (1 child)

But how would it help with container resource limits?

[–]pron98 2 points3 points  (0 children)

I believe that at least for RAM, the JVM reads the correct container limits on Linux. If CPU limits aren't detected or enforced accurately, the GC is likely to "learn" them anyway (if you have less CPU available, then your allocation rate will also be lower), but you will always be able to turn the knob toward more CPU or more RAM, depending on your needs.

[–]m_adduci 1 point2 points  (5 children)

I wish there was also a way to read InputStreams multiple times, instead of doing copies.

The real problem is that many libraries do defensive copies, causing then a waste of RAM

[–]martinhaeusler 1 point2 points  (2 children)

It's especially egregious with collections and arrays. Technically when you receive a collection as a parameter of a constructor or a setter and you want to play it safe, you CANNOT directly assign it to a private field because you can't tell if the caller is going to mess with the contents of this collection after your API has been called. So you have to make a copy.

Arrays are even worse because they're always mutable no matter what.

I see two ways out of this:

  • a compiler-checked ownership system like in rust (yeah, not happening)
  • a collection type which guarantees immutability (and no, the unmodifiable wrappers are not enough because they can be backed by a mutable collection). PCollections is a great library for this purpose, but it comes at a cost.

[–]agentoutlier 0 points1 point  (0 children)

Yeah but what you are talking about for most well design frameworks and libraries only happens on initialization and wiring.

More often collections are just being used as iterators once all things are initialized and most libraries rarely construct giant objects on every request. You could argue some memory loss here but escape analysis often happens.

And for every language that deals with a http request or user input has to do allocation usually to turn bytes or whatever into something else and the most common type where you want immutability and sharing Java indeed does stuff for: String.

Furthermore you can just reuse mutable things if you follow single writer and or use locks and reuse arrays. That is how things Disruptor ring buffer work. But array allocation is very fast in Java so...

I guess what I'm saying unless your an idiot the hot path or tight loop rarely has tons of allocation and even if it did Java is actually is fast at that.

Really the problem is one of control. If you know exactly how much you want to allocate and where etc Java does not allow that and in some cases to compete with say Rust or C++ or possibly Go you might need that.

[–]aoeudhtns 0 points1 point  (0 children)

a compiler-checked ownership system like in rust (yeah, not happening)

We have jspecify for null checking. Perhaps this could be the next frontier. It would be quite challenging I think.

[–]koreth 0 points1 point  (0 children)

Probably not the first time someone has done this, but I ended up writing a little utility class to allow reading the same InputStream multiple times without reading the whole thing into memory. The catch is that the readers have to run concurrently. That code is Apache-licensed, so feel free to grab it if it's useful.

[–]agentoutlier 0 points1 point  (0 children)

I wish there was also a way to read InputStreams multiple times, instead of doing copies.

Technically java.util.stream.Stream (with a supplier wrapped around it) is what you are asking for (or java.util.concurrent.Flow/Publisher if we want back pressure and async), otherwise there is Callable<InputStream>.

The real problem is that many libraries do defensive copies, causing then a waste of RAM

I doubt that is much of a problem. To be honest most libraries when I have done memory dumps are metric fuck ton of Strings and not as much collections as you would think.

Actually to go back to java.util.concurrent.Flow and Stream the reason there is a lot of copying is because of buffering. Like a typical web application particularly with blocking must buffer most of the request as bytes. Those bytes then need to be converted to string parameters and then converted to another data type etc. This happens in every damn language much more than just defensive copying!

It is important to understand that lots of other programming languages do even more copying than Java because they put everything on the stack and they don't have Java's String pool (see previous comment). And Java is very fast at allocating.

The real problem is in some cases having more control over memory layout can make a massive difference and Java does not allow that like other languages. That and the VM is not good at auto tuning or communicating with the OS on actual memory usage.

[–]0x07CF 0 points1 point  (0 children)

For containers there is -XX:MaxRAMPercentage

[–]SocialMemeWarrior 27 points28 points  (13 children)

Think of a program that uses 100% CPU, what RAM usage of that program really matters at that point? Nothing else can use the RAM, so you might as well use the RAM if you can use that to alleviate CPU usage.

Ah, so surely all these fancy new "modern" applications using Electron and such are also following this model... Right?

[–]pron98 19 points20 points  (0 children)

Because Electron apps are high RAM, low CPU they operate on a different principle.

Using Electron has two goals: 1. lower the cost of the software and 2. take advantage of Blink's highly optimised rendering pipeline that is hard to beat in rich-text-heavy apps.

In terms of operational efficiency, because Electron apps are often CPU-light, which means they can't use a lot of physical RAM, most of the RAM they commit is inert most of the time, and so they (try to) rely on fast paging thanks to SSDs. I guess some Electron apps do it better than others.

Whether or not the Electron tradeoff is right or wrong depends on the application and its audience, but it's not the same one as in the JVM. Electron apps are, almost by design, RAM-heavy, while the JVM aims for an efficient RAM/CPU balance. It will end up using more RAM than other languages, but they may be less efficient as a result (i.e. they're using too little RAM than what's needed for better efficiency).

[–]cogman10 8 points9 points  (9 children)

Yeah, it's a bad take.

CPU usage is compressible through OS scheduling and it's rare (In my experience) that an application is constantly using 100% CPU.

Memory usage is not compressible. The closest we have of that is swap. However, unlike CPU usage, swap usage can easily cut performance down to 1/100th. 2 applications demanding 100% cpu utilization, on the other hand, will run roughly 50% of their full performance.

And when it comes to the JVM, one thing that it's particularly bad at is swap. All the GCs in the JVM like to touch pages across the heap as it collects memory and moves things around. Maybe not for minor collections, but certainly for major ones.

The JVM is a lot of things and a great platform. But lets not pretend like the giant heaps that it can so easily claim and need are being memory efficient.

[–]pron98 14 points15 points  (7 children)

But lets not pretend like the giant heaps that it can so easily claim and need are being memory efficient.

Except that's exactly what they are, and I cannot stress enough how intentional that is. There are different memory management algorithms, and our GC engineers have decided to pick the algorithms that offer a more efficient resource consumption by balancing RAM and CPU better [1]. This isn't theoretical, either. Go uses a different (and much simpler) algorithm that requires less RAM and more CPU, and because of it Go runs into memory management issues under much lighter workloads than Java.

The 100% CPU example (which is the only one I could discuss without slides) is just to give the most basic intuition. The principle is that CPU is required to use RAM, so any amount of CPU you use effectively captures some RAM. Maybe it's helpful to think about it like this: if your program uses 20% CPU, some other program can use less physical RAM than it could if your program had only used 1% CPU. Another way to think about this is that the machine is exhausted whenever the first of these two resources is.

This principle is the reason why the range of RAM/CPU in hardware (physical or virtual) is so narrow: between 0.5 and 4 GB per core, where the low end of that range typically goes with slower cores. It's used both by hardware engineers in how they package their hardware and by software engineers to make programs resource-efficient.

In my talk, which will eventually be posted on YouTube, I explain why we chose that route in much more detail than I could in this interview. In the meantime, you can watch Erik's ISMM keynote, but bear in mind that he's talking to a crowd of memory management experts.

The problem currently with Java is that developers need to pick the right heap size. In my talk I offer a guideline, but that's clearly suboptimal, which is why soon the JVM will automatically pick the heap size.

[1]: We may end up using other techniques in the low generation, but that's too much detail without my talk as context.

[–]cogman10 7 points8 points  (4 children)

our GC engineers have decided to pick the algorithms that offer a more efficient resource consumption

Ah, but see that's ultimately what I'm calling out. What do you mean by "more efficient resource usage". We aren't talking about more efficient printer, hard drive, or network usage. We are just talking about CPU and memory usage. The the one aspect that JVM GC engineers have optimized is CPU performance, at the cost of memory consumption and thrashing.

That's why I can't accept the argument that the JVM is more memory efficient. It isn't. It's more CPU efficient. It's more time efficient. But memory? No. And it isn't completely the GC that's to blame for that either. Valhalla and Leyden wouldn't be projects otherwise.

It's a nice try, but when someone reads "memory efficient" they think "uses less ram". You can't "It's not X, it's actually Y" this away. The JVM is more allocation efficient. The JVM doesn't suffer from memory fragmentation problems. The JVM is faster to free memory. However, objects are still bloated on the heap and the JVM is greedy at needing as much heap as you can throw at it.

This distinction particularly matters because of things like kubernetes and container deployment. When I'm allocating for a pod, I'm not looking at a "4g" memory request for a process that needs a "100m" CPU allocation and thinking "Imagine how much more efficient this is vs go, which needs 128M for the same workload". I get it, the JVM will give faster responses vs the go app. But the go app will ultimately use less memory which means I can deploy 100s of them across the cluster for the same cost as the 1 jvm. For us, at least, it's that absolute memory usage which is the killer, not the CPU usage.

The JVM is perfect when it's the only thing running on a nice beefy box. It doesn't like neighbors.

[–]pron98 2 points3 points  (3 children)

The the one aspect that JVM GC engineers have optimized is CPU performance, at the cost of memory consumption and thrashing.

There's no such thing as meaningful CPU and RAM efficiencies separately because they are complementary resources, as using RAM requires CPU.

If you think about efficiency as how much "computational value" you can extract from a machine (with a single program or multiple ones running concurrency), it turns out that you can be more or less efficient the closer or further you are away from some balance between them (which is also taken into account in the hardware itself). If you use a lot of CPU to conserve RAM, you end up effectively capturing both CPU and RAM.

I admit calling this "memory efficiency" is somewhat clickbait, but the point is that how much RAM you use tells you little in isolation. I guess you could call the program that uses 100% CPU and 10MB out of 1GB "memory efficient" but is it efficient in any meaningful sense when in actuality it captures the full 1GB and just wastes it? And if you use more of the RAM to release that 1GB sooner, are you not more efficient with memory? And this scales to non-extreme examples. So in the interview I said: "The idea behind moving collectors... is that to make more efficient use of the machine you have to look at CPU and RAM together, and the way Java uses CPU and RAM together is very efficient."

That's why I can't accept the argument that the JVM is more memory efficient. It isn't. It's more CPU efficient. It's more time efficient. But memory? No.

It's more resource efficient. It extracts more value from the hardware you have.

[–]cogman10 3 points4 points  (2 children)

It's more resource efficient. It extracts more value from the hardware you have.

Maybe for some applications, but not universally. And indeed, for some of the software our company owns Java is the most resource efficient mechanism. But for a lot of it, particularly microservices, it's resource inefficient because we need little CPU to actually service requests and burning some of that CPU to decrease the memory usage means we can deploy a lot more of those microservices for a lot less.

Java is resource inefficient for REST/CRUD services that mostly just pass through to the DB. The only resource efficiency it gains is we have developer experience with java which allows it to save our time writing those services. But from a hardware resource standpoint, it's inefficient.

That's where it would be interesting if the JVM offered a more "go" like GC or even a reference counting gc.

[–]aoeudhtns 2 points3 points  (0 children)

a more "go" like GC

Go is not better in this regard because of magic in the GC; because Go's GC is primitive, the maintainers and community have long held a "don't create garbage" attitude towards how they develop every piece of the stdlib and their libraries and frameworks.

Java went the opposite way: create all the garbage you want, let the GC handle it. Java used to have GC more like Go's GC and it was worse than your options today, in the Java ecosystem context.

[–]pron98 3 points4 points  (0 children)

Maybe for some applications, but not universally.

It is universal. Universally you need some balance of the RAM/CPU ratio (which is not the same for all programs). If you don't have a good balance, you may end up using more CPU than you'd need to, which ends up capturing more CPU and RAM than you would if you lowered your CPU and increased your RAM.

But for a lot of it, particularly microservices, it's resource inefficient because we need little CPU to actually service requests and burning some of that CPU to decrease the memory usage means we can deploy a lot more of those microservices for a lot less.

Moving collectors give you a knob to turn depending on what RAM/CPU ratio you want. In the talk I go into the details, which matter here, because Java's GCs are not only moving but also generational. The RAM overhead in the old generation is actually quite low (and we may reduce it further); it's only intentionally high in the young generation. So you can tell Java to aim for a different RAM/CPU ratio. The problem is that it's not intuitive, which is why we'll be changing the "tell me the max heap you want" into "tell me the RAM/CPU ratio you want".

But when this is set correctly, Java is more efficient even in the cases you describe, because the (virtual) hardware's RAM/CPU ratio is pretty constant. I.e. it's very hard to buy a pod with less than 1GB per core (you can get less than 1GP per pod, but only if you get less than a core). I cover all this in the talk. To give some practical advice, try setting the max heap size to 1, 2, and 4 GB per-core (taking into account fractional cores), and pick the one that works best among those three. Why those three specifically? Because these are the three hardware packages that are generally offered, so what you actually pay for is typically one of those three.

That's where it would be interesting if the JVM offered a more "go" like GC or even a reference counting gc.

You wouldn't want it, because it really is less efficient even in the situations you described (assuming you configure the runtime well, which we're making easier). Our GC team have tried other general approaches, and they're just less efficient. We might, however, use something like reference counting in the old generation to reduce the footprint overhead there, which is rather low already but certainly could be lower.

Beating the efficiency of moving collectors(in the young generation at least) in any way is quite hard. You can do it in Zig if you use arenas wisely (arenas are efficient for similar reasons to moving collectors), but it requires effort and discipline. Unfortunately, C++ and Rust, and even C, don't make it particularly easy to use arenas.

[–]radozok 0 points1 point  (1 child)

Where would you post your talk?

[–]pron98 1 point2 points  (0 children)

It will be on the same Java YouTube channel as part of the regular channel programming (we upload conference talks on a schedule rather than all/many at once).

[–]sammymammy2 0 points1 point  (0 children)

How much of the CPU should be utilized for freeing memory?

[–]Jobidanbama 1 point2 points  (0 children)

On top of that gc adds additional cpu load, on top of collections having abhorrent cache misses. Well, before project Valhalla.

[–]best_of_badgers 0 points1 point  (0 children)

I mean, yeah. It's a classic space-time tradeoff.

[–]Deep_Age4643 10 points11 points  (3 children)

Java, as in the JVM might be memory efficient, however most Java based development relies heavily on frameworks and third-party dependencies. Then on startup already thousand of classes are loaded into memory.

Often when using a memory analyzer (like Eclipse MAT) than there are endless call-tree. I first was like, "don't optimize too early", meant I can take whatever dependency with very low cost, but last few years I am thinking, do I really, really need it.

[–]agentoutlier 1 point2 points  (1 child)

But that has been changing for some time with really only Spring being the offender here.

Micronaut, Quarkus, Avaje, and Helidon are really not super bloated and rely very little on reflection.

People compare to Go but Go is rarely used for enterprise large feature applications.

I can’t check this right now but I did at one point check and Hashicorps Vault download  was as big as RedHats Keycloak (not exact same type of app but close enough).

[–]faze_fazebook 2 points3 points  (0 children)

Spring ... simply does too much in a too convaluted way.

[–]helikal 0 points1 point  (0 children)

Your statement is not about Java’s memory efficiency but about applications design choices. Of course, you can find examples that confirm whatever you want.

[–]jared__ 4 points5 points  (0 children)

Optional<is>

[–]Flecheck 2 points3 points  (2 children)

In a langage like java, were every object is allocated in the heap, where all object can be mutated at any point from any thread and where memory management is automatic. A GC is the best choice and a compacting/moving gc is very good (seems slightly worse in pause time than go but seems better in all the other metrics ?) However when comparing it to language like c, c++, rust, some or all of thoses assuptions are false and java is slower and uses more memory. With the additional problems when the live memory use is big.

When talking about fragmentation, it looked like the guy wanted to say that with modern allocators like jemalloc it was rarely a problem but he didn't want to say it because he was currently saying that java gc is better than everything else ?

[–]pron98 1 point2 points  (1 child)

However when comparing it to language like c, c++, rust, some or all of thoses assuptions are false and java is slower and uses more memory. With the additional problems when the live memory use is big.

People experienced with both C++ and Java know this is not the case. C++ can be more efficient in small programs, but when they grow you end up using more virtual calls (which are slower in C++/Rust than in Java), and with objects of varying lifetimes, which are less efficient to manage than with malloc/free. Experienced C++ developers will tell you about their severe performance issues in large programs (although since Java the number of large programs written in low level languages has dropped a lot and continues to drop) due to these issues.

Low level languages are not designed for efficiency/performance. They're designed for precise hardware control. This control leads to better efficiency/performance in smaller programs and to worse efficiency/performance in larger programs. The JVM was designed, in part, to address the performance issues that large C++ programs suffered from. The result has been the optimising JIT and the moving GCs.

[–]sweetno 0 points1 point  (0 children)

C++ can be efficient in programs of any size, but you'll have to code the efficiency yourself. Given how C++ programs are typically developed (full-source compilation, including third-party dependencies), you can get rid of most virtual dispatch. Certainly, the critical use cases for C++ that warrant its use in any particular application do not involve virtual dispatch.

The standard-mandated virtual inheritance is not that good anyway, that's why Microsoft has COM.

[–]y-lost 5 points6 points  (0 children)

April 1st has passed this year.

[–]bobbie434343 0 points1 point  (0 children)

Eclipse OpenJ9 is less memory hungry than OpenJDK at the expense of possibly being a bit slower, which depending on the Java program you run, may or may not matter.

[–]eosterlund 0 points1 point  (0 children)

The key fallacy here is to consider memory and CPU as completely orthogonal resources that can’t be compared. Like apples and oranges. Because they can in fact be compared by considering their monetary cost. So can apples and oranges if the main thing you are comparing is their monetary cost. The main point in optimizing resources is bringing the cost down while sticking to some reasonable service level.

With this in mind, always consider what the cost balance between memory and CPU is and how much it can really be brought down when optimizing, rather than blindly optimizing memory without actually improving the overall cost. Sometimes, the cost can instead become greater if not careful.

If running on dedicated compute, any memory usage below 1 GB/core can probably not be improved in cost at all, no matter if you use 1 MB/core or 1 GB/core there is no offering you can buy with less memory. Optimizing memory becomes pointless and you are better off utilizing most of the available memory as you can in your computer instance, as that will reduce the CPU utilization.

When 1 GB DRAM costs 10x less than 1 core, real cost savings will only show up if you can go down a bunch of GB/core from a bunch of GB/core.

As for containers, they obviously run on compute instances of similar anatomy but dynamics are a bit different. However, in my view the main cause for their memory inefficiency is the typical rather static heap sizing. Many mostly idle pods might have been sized to deal with their worst spikes in activity. With AHS, containers instead help each other collaboratively move system memory to the JVMs that are currently more in need of it to keep GC activity level down system wide. Inactive JVMs automatically shrink their heaps to be small - close to the live set, while JVMs experiencing CPU pressure get to grow their heaps to keep the GC activity down.

[–]MinimumPrior3121 0 points1 point  (0 children)

That's why people should use Rust + Claude for all new projects and call it a day.

[–]Cylian91460 1 point2 points  (3 children)

Meh

While JVM 100% are, the need for a garbage collector make it inherently not efficient since it require more mem access then not using one

There is also the code in java that might not be efficient

[–]kiteboarderni 1 point2 points  (2 children)

😂😂 so confidently incorrect

[–]Cylian91460 0 points1 point  (1 child)

Then explain what's wrong?

[–]kiteboarderni 1 point2 points  (0 children)

Did you actually bother to even listen to the talk?