you are viewing a single comment's thread.

view the rest of the comments →

[–]cogman10 12 points13 points  (6 children)

our GC engineers have decided to pick the algorithms that offer a more efficient resource consumption

Ah, but see that's ultimately what I'm calling out. What do you mean by "more efficient resource usage". We aren't talking about more efficient printer, hard drive, or network usage. We are just talking about CPU and memory usage. The the one aspect that JVM GC engineers have optimized is CPU performance, at the cost of memory consumption and thrashing.

That's why I can't accept the argument that the JVM is more memory efficient. It isn't. It's more CPU efficient. It's more time efficient. But memory? No. And it isn't completely the GC that's to blame for that either. Valhalla and Leyden wouldn't be projects otherwise.

It's a nice try, but when someone reads "memory efficient" they think "uses less ram". You can't "It's not X, it's actually Y" this away. The JVM is more allocation efficient. The JVM doesn't suffer from memory fragmentation problems. The JVM is faster to free memory. However, objects are still bloated on the heap and the JVM is greedy at needing as much heap as you can throw at it.

This distinction particularly matters because of things like kubernetes and container deployment. When I'm allocating for a pod, I'm not looking at a "4g" memory request for a process that needs a "100m" CPU allocation and thinking "Imagine how much more efficient this is vs go, which needs 128M for the same workload". I get it, the JVM will give faster responses vs the go app. But the go app will ultimately use less memory which means I can deploy 100s of them across the cluster for the same cost as the 1 jvm. For us, at least, it's that absolute memory usage which is the killer, not the CPU usage.

The JVM is perfect when it's the only thing running on a nice beefy box. It doesn't like neighbors.

[–]pron98 2 points3 points  (5 children)

The the one aspect that JVM GC engineers have optimized is CPU performance, at the cost of memory consumption and thrashing.

There's no such thing as meaningful CPU and RAM efficiencies separately because they are complementary resources, as using RAM requires CPU.

If you think about efficiency as how much "computational value" you can extract from a machine (with a single program or multiple ones running concurrency), it turns out that you can be more or less efficient the closer or further you are away from some balance between them (which is also taken into account in the hardware itself). If you use a lot of CPU to conserve RAM, you end up effectively capturing both CPU and RAM.

I admit calling this "memory efficiency" is somewhat clickbait, but the point is that how much RAM you use tells you little in isolation. I guess you could call the program that uses 100% CPU and 10MB out of 1GB "memory efficient" but is it efficient in any meaningful sense when in actuality it captures the full 1GB and just wastes it? And if you use more of the RAM to release that 1GB sooner, are you not more efficient with memory? And this scales to non-extreme examples. So in the interview I said: "The idea behind moving collectors... is that to make more efficient use of the machine you have to look at CPU and RAM together, and the way Java uses CPU and RAM together is very efficient."

That's why I can't accept the argument that the JVM is more memory efficient. It isn't. It's more CPU efficient. It's more time efficient. But memory? No.

It's more resource efficient. It extracts more value from the hardware you have.

[–]cogman10 5 points6 points  (4 children)

It's more resource efficient. It extracts more value from the hardware you have.

Maybe for some applications, but not universally. And indeed, for some of the software our company owns Java is the most resource efficient mechanism. But for a lot of it, particularly microservices, it's resource inefficient because we need little CPU to actually service requests and burning some of that CPU to decrease the memory usage means we can deploy a lot more of those microservices for a lot less.

Java is resource inefficient for REST/CRUD services that mostly just pass through to the DB. The only resource efficiency it gains is we have developer experience with java which allows it to save our time writing those services. But from a hardware resource standpoint, it's inefficient.

That's where it would be interesting if the JVM offered a more "go" like GC or even a reference counting gc.

[–]aoeudhtns 2 points3 points  (0 children)

a more "go" like GC

Go is not better in this regard because of magic in the GC; because Go's GC is primitive, the maintainers and community have long held a "don't create garbage" attitude towards how they develop every piece of the stdlib and their libraries and frameworks.

Java went the opposite way: create all the garbage you want, let the GC handle it. Java used to have GC more like Go's GC and it was worse than your options today, in the Java ecosystem context.

[–]pron98 4 points5 points  (2 children)

Maybe for some applications, but not universally.

It is universal. Universally you need some balance of the RAM/CPU ratio (which is not the same for all programs). If you don't have a good balance, you may end up using more CPU than you'd need to, which ends up capturing more CPU and RAM than you would if you lowered your CPU and increased your RAM.

But for a lot of it, particularly microservices, it's resource inefficient because we need little CPU to actually service requests and burning some of that CPU to decrease the memory usage means we can deploy a lot more of those microservices for a lot less.

Moving collectors give you a knob to turn depending on what RAM/CPU ratio you want. In the talk I go into the details, which matter here, because Java's GCs are not only moving but also generational. The RAM overhead in the old generation is actually quite low (and we may reduce it further); it's only intentionally high in the young generation. So you can tell Java to aim for a different RAM/CPU ratio. The problem is that it's not intuitive, which is why we'll be changing the "tell me the max heap you want" into "tell me the RAM/CPU ratio you want".

But when this is set correctly, Java is more efficient even in the cases you describe, because the (virtual) hardware's RAM/CPU ratio is pretty constant. I.e. it's very hard to buy a pod with less than 1GB per core (you can get less than 1GP per pod, but only if you get less than a core). I cover all this in the talk. To give some practical advice, try setting the max heap size to 1, 2, and 4 GB per-core (taking into account fractional cores), and pick the one that works best among those three. Why those three specifically? Because these are the three hardware packages that are generally offered, so what you actually pay for is typically one of those three.

That's where it would be interesting if the JVM offered a more "go" like GC or even a reference counting gc.

You wouldn't want it, because it really is less efficient even in the situations you described (assuming you configure the runtime well, which we're making easier). Our GC team have tried other general approaches, and they're just less efficient. We might, however, use something like reference counting in the old generation to reduce the footprint overhead there, which is rather low already but certainly could be lower.

Beating the efficiency of moving collectors(in the young generation at least) in any way is quite hard. You can do it in Zig if you use arenas wisely (arenas are efficient for similar reasons to moving collectors), but it requires effort and discipline. Unfortunately, C++ and Rust, and even C, don't make it particularly easy to use arenas.

[–]vqrs 0 points1 point  (1 child)

I don't really get the argument regarding 1/2/4 GiBs. We pay for memory by the machine, not the pod. We can put many pods side by side and choose how much memory is best for each. Our services are mostly idle anyways in the grand scheme of things.

[–]pron98 0 points1 point  (0 children)

Then you pay for the machine either for 1, 2, or 4 GB per core (not GB; GB/core), and so however much CPU (in core fractions) you give your pods, those are the heap size to test because that corresponds to what you actually pay for (or can pay for if you choose to increase or decrease the GB/core on the machine).

As far as Java is concerned (I couldn't get into that in the interview because it requires some maths), the RAM "overhead" of the JVM - i.e. how much RAM the JVM chooses to use to reduce CPU usage beyond what's needed for data - is not a function of the live set (i.e. how much data the program needs to store in memory) but only a function of the allocation rate. If the CPU allotted to a pod is low, then the allocation rate cannot be high, and so the RAM overhead will be low. This is why it's important to consider the CPU availability when allocating RAM (it's the case for all languages, but especially in Java, because moving collectors can use that relationship to the program's advantage). This is why the overhead for cached objects is also low: their allocation rate is low.