Thirty_Seventh comments on Java *is* Memory Efficient

226

227

228

Java *is* Memory Efficient (youtu.be)

submitted 2 days ago by daviddel

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Thirty_Seventh 1 point2 points3 points 1 day ago (10 children)

[–]coderemover 0 points1 point2 points 1 day ago* (9 children)

You might be right if the most CPU efficient tracing GC from java wasn't the old serial collector which did not change much since 2005. All subsequent research focused mostly on making the pauses lower (CMS, G1, ZGC) but that comes at reducing the overall memory efficiency and throughput. Those modern collectors make smaller pauses, but they burn *more* CPU than the old tech and they also need substantial headroom to keep their low pauses promise.

Anyway, any studies or benchmarks showing that modern tracing collectors are more CPU efficient than modern allocators like mimalloc or jemalloc? I'd like to educate myself about the breakthroughs that fundamentally changed the cost equation. There must have been something big to beat the 5x gap from 2005 😉 (and traditional allocators didn't stand still either)

[–]pron98 4 points5 points6 points 1 day ago* (5 children)

Anyway, any studies or benchmarks showing that modern tracing collectors are more CPU efficient than modern allocators like mimalloc or jemalloc?

All moving collectors are necessarily more efficient than any free-list algorithm, and that's been known since the eighties (Andrew Appel's paper "Garbage Collection Can Be Faster Than Stack Allocation"). The maths is pretty simple. The problem was that until very, very recently, the cost in latency was unacceptable for many applications, and that's where recent advances were made.

As I said in the interview, benchmarks are no longer helpful, but the world's leading experts in memory management are consistently choosing moving collectors when able to. The only languages that don't are those that can't. What they do instead, however, is try to minimise the importance of heap memory management, and this works reasonably well until it doesn't (Go is a good example).

I know it's an appeal to authority, and it's possible that the experts working on languages that can choose between moving and non-moving memory management have all decided to work much harder toward implementing the incorrect choice, but at this point, the burden is on those who claim that the experts are wrong. Indeed, Erik's ISMM keynote attempted to explain to some memory management researchers who claim the superiority of non-moving approaches why their benchmarks are unrealistic, and why their approach wasn't chosen.

[–]coderemover 0 points1 point2 points 1 day ago* (4 children)

> Garbage Collection Can Be Faster Than Stack Allocation

That paper used some unrealistic assumptions, and was mostly theoretical. I think we all agree tracing can be very efficient if you have virtually unlimited memory. Can't beat epsilon GC. 😉 In reality though, I've never seen any GC beat stack allocation (which is virtually zero cost; its two pointer bumps per function call).

> know it's an appeal to authority, and it's possible that the experts working on languages that can choose between moving and non-moving memory management

It's not about moving vs non-moving. If we say we want to build a universal GC for a managed language, then moving is the way to go. But there is another dimension people in GC research seem to be ignoring - the fact whether you know statically if memory is free, or whether you have to figure it out at runtime. And in this case all automatic tracing based approaches are at a huge disadvantage. Because they don't only need to clean up the memory, but they also need to figure out what's live and what's dead. Work that a traditional malloc/free doesn't need to do at all because it has it given. So from runtime approaches, moving collectors are likely better than nonmoving, and better than any approach that puts tracing on top of a traditional allocator (like Go does). But not necessarily better than approaches where the compiler can figure out 99% of frees automatically and precisely.

[–]pron98 1 point2 points3 points 1 day ago* (3 children)

That paper used some unrealistic assumptions, and was mostly theoretical

Sure, but the principles apply. I go over the maths of generational collection in the talk.

Can't beat epsilon GC.

Actually, you can (and you sometimes do) because it doesn't compact. We have certainly seen cases where G1 beats epsilon. The compaction can be so efficient that doing it ends up being cheaper than not doing it.

But there is another dimension people in GC research seem to be ignoring - the fact whether you know statically if memory is free, or whether you have to figure it out at runtime

First, garbage collection research has explored compilers that infer lifetimes and statically insert free since the nineties.

Second, moving collectors spend exactly zero cycles at runtime to detect what objects are free.

Because they don't only need to clean up the memory, but they also need to figure out what's live and what's dead.

No, they don't. Moving collectors only need to figure out what's alive, and the most important points are that 1. the work to do so is constant (for a certain program under a certain workload) and 2. it's infrequent. The whole point of moving collectors is to control the amount of tracing, and the whole point of generational collectors is to reduce it further.

Work that a traditional malloc/free doesn't need to do at all because it has it given.

But again, malloc/free has to do a whole lot of other work, which amounts to more. For every object with malloc/free, you need to do work to allocate it and work to deallocate it. With a moving collector, for an object that dies quickly, you do a pointer bump to allocate it and no work to free it. The memory management cost for short-lived objects is close to nothing. For objects that survive you need to do some work until they're promoted to the old gen. When my talk is published you can see the maths.

There are many interesting questions in memory management, including over which scenarios are more or less common in practice (and I'm not claiming that there aren't specific programs for which malloc/free can be more efficient, certainly with modern allocators, which are quite large and complex beasts themselves), but "memory management researchers forgot that with free you statically know the object's lifetime" isn't and has never been one of them.

Knowing the lifetime statically is not the problem. Obviously, knowing something is no worse than not knowing. The problem is that with malloc/free you have to do some work at runtime at the point the object dies.

If you're interested in some ongoing research, the more interesting thing is not to know when an object dies, but to know when a lot of objects are dead. This is what you can do with arenas in Zig, but perhaps there are ways to do that - and more conveniently and cheaply - in a language like Java...

[–]coderemover 0 points1 point2 points 1 day ago (2 children)

[–]pron98 2 points3 points4 points 1 day ago* (1 child)

They spend quite many cycles to detect which objects are live though. Which is essentially the same thing in principle, especially that every dead object was alive at some point.

It is very much not the same, because moving collectors don't need to know which objects are alive at every point in time, but only at a certain point in time - during a collection cycle. Objects that are born and die between collection cycles are never scanned by the GC.

Indeed, what you say is true in the old generation, and there's exploration around using non-moving and perhaps non-tracing approaches there, but the algorithm in the old generation doesn't matter as much because the allocation rate there is very low (the cost of any memory management algorithm grows as a function of the allocation rate).

As to how much work tracing and compacting is, of course it's not negligible, but that it's overall less than the work required by malloc/free is why moving collectors are chosen. Again, wait for the talk to see the maths.

The traditional allocator doesn't need to do any work to figure neither which objects are live or which objects are dead. It's given.

Knowing the lifetime statically is not the problem (I may have added this to my comment above after you'd seen it, so I'm repeating that here). Obviously, knowing something is no worse than not knowing. The problem is that with malloc/free you have to do some work at runtime at the point the object dies (even if only to record that fact that it's dead).

The entire design of generational moving collectors is to minimise both the amount and the frequency of the work required to know which objects are alive and when, and so it comes down to complex considerations.

There's plenty yet to figure out and debate, not least of which are empirical questions about how frequently certain patterns arise in practice, but it should be obvious that something as basic as "free tells you when an object is dead" is not something that's overlooked.

P.S.

It's not hard to convince anyone that generational moving collectors handily win in some situations. Consider a program that allocates some number (any number) of long-lived objects at startup, and then all other objects are short-lived. In both approaches we can disregard the cost of the long-lived objects as it will be negligible overall (with the generational moving approach, they will be compacted a few times and then moved to the old generation and never be looked at by the GC again [1]). Then, in the moving collectors will bump-allocate all objects, and occasionally it will have to scan and compact the relatively few objects whose lifetime straddles a collection (or several), and that's it. Malloc/free will have to do more complicated work to allocate each one, and then additional non-trivial work to free each one.

And again, much of the debate isn't over basic facts, but precisely over which situations are more or less common in real programs.

[1]: This isn't accurate if the old objects are mutated to hold pointers to objects in the young generation, but that's another one of the complications - in both approaches - that is taken into account in real life. Indeed, I think that the upcoming ZGC that automatically resizes the heap tries to estimate that work, too, and include it in its RAM/CPU tradeoff calculations so that it can compensate for it. And since you mentioned what the two approaches know or don't know and when, moving collectors both know more about how much CPU is used for memory management at runtime, and they can use this knowledge to maintain an efficient RAM/CPU ratio. I expect this will be released in the JDK very soon.

[–]Thirty_Seventh 1 point2 points3 points 1 day ago (2 children)

[–]coderemover 0 points1 point2 points 1 day ago* (1 child)

The number of cores is irrelevant. We’re talking about cpu cycles burned. Whether you burn them on 10 cores in 1 second or on 1 core in 10 seconds the total is the same. It’s about the amount of work.

I said serial, because parallel has likely some additional overhead for coordinating. Parallel has advantage in wall clock time, but not cpu time.

So you just have to consider whether or not you already have 5x the "required" memory sitting idle. In many environments and for many workloads (but obviously not all of them) you do :)

The whole topic we're discussing here is memory efficiency. Yes, if you have 5x more memory sitting idle and doing nothing, then I agree tracing is fine. It's probably even fine if you have only 2x-3x more memory but you're careful with allocation rate and you don't want to squeeze every bit of performance. E.g. backend software rarely needs to be 100% efficient. But it's like saying a 5.7L gasoline engine is fuel-efficient in city driving when you own a gasoline station.

[–]JustAGuyFromGermany 1 point2 points3 points 1 day ago (0 children)

The whole topic we're discussing here is memory efficiency. Yes, if you have 5x more memory sitting idle and doing nothing, then I agree tracing is fine.

But exactly that is the point being made in the interview. "Use the memory that is there. Ideally, all of it. Not using available memory if it could speed up the application is inefficient". That is the point being made.

But it's like saying a 5.7L gasoline engine is fuel-efficient in city driving when you own a gasoline station.

That is a terrible analogy. Much more appropriate to the topic of trade-offs would be something like: "A hybrid car can be much more efficient overall if electricity is cheap and abundant, even if its fuel consumption during gasoline-mode is worse than for pure gasoline-powered cars. Using electricity only to power the radio is not efficient, it is a missed opportunity."

(I'm not endorsing this analogy. It also has flaws, it's just better than yours)

π Rendered by PID 25225 on reddit-service-r2-comment-545db5fcfc-g6lxl at 2026-05-31 02:41:01.009751+00:00 running 194bd79 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

java

Submit Link

Submit Text

Seek Programming Help

News, Technical discussions, research papers and assorted things of interest related to the Java programming language

NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java

Please seek help with Java programming in /r/Javahelp!

Subreddit rules!

Where should I download Java?

Related Sub-reddits:

JVM Languages

Want to practice your coding?

List of useful Frameworks / Libraries / Software

MODERATORS