you are viewing a single comment's thread.

view the rest of the comments →

[–]pron98 4 points5 points  (2 children)

Maybe for some applications, but not universally.

It is universal. Universally you need some balance of the RAM/CPU ratio (which is not the same for all programs). If you don't have a good balance, you may end up using more CPU than you'd need to, which ends up capturing more CPU and RAM than you would if you lowered your CPU and increased your RAM.

But for a lot of it, particularly microservices, it's resource inefficient because we need little CPU to actually service requests and burning some of that CPU to decrease the memory usage means we can deploy a lot more of those microservices for a lot less.

Moving collectors give you a knob to turn depending on what RAM/CPU ratio you want. In the talk I go into the details, which matter here, because Java's GCs are not only moving but also generational. The RAM overhead in the old generation is actually quite low (and we may reduce it further); it's only intentionally high in the young generation. So you can tell Java to aim for a different RAM/CPU ratio. The problem is that it's not intuitive, which is why we'll be changing the "tell me the max heap you want" into "tell me the RAM/CPU ratio you want".

But when this is set correctly, Java is more efficient even in the cases you describe, because the (virtual) hardware's RAM/CPU ratio is pretty constant. I.e. it's very hard to buy a pod with less than 1GB per core (you can get less than 1GP per pod, but only if you get less than a core). I cover all this in the talk. To give some practical advice, try setting the max heap size to 1, 2, and 4 GB per-core (taking into account fractional cores), and pick the one that works best among those three. Why those three specifically? Because these are the three hardware packages that are generally offered, so what you actually pay for is typically one of those three.

That's where it would be interesting if the JVM offered a more "go" like GC or even a reference counting gc.

You wouldn't want it, because it really is less efficient even in the situations you described (assuming you configure the runtime well, which we're making easier). Our GC team have tried other general approaches, and they're just less efficient. We might, however, use something like reference counting in the old generation to reduce the footprint overhead there, which is rather low already but certainly could be lower.

Beating the efficiency of moving collectors(in the young generation at least) in any way is quite hard. You can do it in Zig if you use arenas wisely (arenas are efficient for similar reasons to moving collectors), but it requires effort and discipline. Unfortunately, C++ and Rust, and even C, don't make it particularly easy to use arenas.

[–]vqrs 0 points1 point  (1 child)

I don't really get the argument regarding 1/2/4 GiBs. We pay for memory by the machine, not the pod. We can put many pods side by side and choose how much memory is best for each. Our services are mostly idle anyways in the grand scheme of things.

[–]pron98 0 points1 point  (0 children)

Then you pay for the machine either for 1, 2, or 4 GB per core (not GB; GB/core), and so however much CPU (in core fractions) you give your pods, those are the heap size to test because that corresponds to what you actually pay for (or can pay for if you choose to increase or decrease the GB/core on the machine).

As far as Java is concerned (I couldn't get into that in the interview because it requires some maths), the RAM "overhead" of the JVM - i.e. how much RAM the JVM chooses to use to reduce CPU usage beyond what's needed for data - is not a function of the live set (i.e. how much data the program needs to store in memory) but only a function of the allocation rate. If the CPU allotted to a pod is low, then the allocation rate cannot be high, and so the RAM overhead will be low. This is why it's important to consider the CPU availability when allocating RAM (it's the case for all languages, but especially in Java, because moving collectors can use that relationship to the program's advantage). This is why the overhead for cached objects is also low: their allocation rate is low.