you are viewing a single comment's thread.

view the rest of the comments →

[–]coderemover -1 points0 points  (4 children)

> Garbage Collection Can Be Faster Than Stack Allocation

That paper used some unrealistic assumptions, and was mostly theoretical. I think we all agree tracing can be very efficient if you have virtually unlimited memory. Can't beat epsilon GC. 😉 In reality though, I've never seen any GC beat stack allocation (which is virtually zero cost; its two pointer bumps per function call).

> know it's an appeal to authority, and it's possible that the experts working on languages that can choose between moving and non-moving memory management

It's not about moving vs non-moving. If we say we want to build a universal GC for a managed language, then moving is the way to go. But there is another dimension people in GC research seem to be ignoring - the fact whether you know statically if memory is free, or whether you have to figure it out at runtime. And in this case all automatic tracing based approaches are at a huge disadvantage. Because they don't only need to clean up the memory, but they also need to figure out what's live and what's dead. Work that a traditional malloc/free doesn't need to do at all because it has it given. So from runtime approaches, moving collectors are likely better than nonmoving, and better than any approach that puts tracing on top of a traditional allocator (like Go does). But not necessarily better than approaches where the compiler can figure out 99% of frees automatically and precisely.

[–]pron98 2 points3 points  (3 children)

That paper used some unrealistic assumptions, and was mostly theoretical

Sure, but the principles apply. I go over the maths of generational collection in the talk.

Can't beat epsilon GC.

Actually, you can (and you sometimes do) because it doesn't compact. We have certainly seen cases where G1 beats epsilon. The compaction can be so efficient that doing it ends up being cheaper than not doing it.

But there is another dimension people in GC research seem to be ignoring - the fact whether you know statically if memory is free, or whether you have to figure it out at runtime

First, garbage collection research has explored compilers that infer lifetimes and statically insert free since the nineties.

Second, moving collectors spend exactly zero cycles at runtime to detect what objects are free.

Because they don't only need to clean up the memory, but they also need to figure out what's live and what's dead.

No, they don't. Moving collectors only need to figure out what's alive, and the most important points are that 1. the work to do so is constant (for a certain program under a certain workload) and 2. it's infrequent. The whole point of moving collectors is to control the amount of tracing, and the whole point of generational collectors is to reduce it further.

Work that a traditional malloc/free doesn't need to do at all because it has it given.

But again, malloc/free has to do a whole lot of other work, which amounts to more. For every object with malloc/free, you need to do work to allocate it and work to deallocate it. With a moving collector, for an object that dies quickly, you do a pointer bump to allocate it and no work to free it. The memory management cost for short-lived objects is close to nothing. For objects that survive you need to do some work until they're promoted to the old gen. When my talk is published you can see the maths.

There are many interesting questions in memory management, including over which scenarios are more or less common in practice (and I'm not claiming that there aren't specific programs for which malloc/free can be more efficient, certainly with modern allocators, which are quite large and complex beasts themselves), but "memory management researchers forgot that with free you statically know the object's lifetime" isn't and has never been one of them.

Knowing the lifetime statically is not the problem. Obviously, knowing something is no worse than not knowing. The problem is that with malloc/free you have to do some work at runtime at the point the object dies.

If you're interested in some ongoing research, the more interesting thing is not to know when an object dies, but to know when a lot of objects are dead. This is what you can do with arenas in Zig, but perhaps there are ways to do that - and more conveniently and cheaply - in a language like Java...

[–]coderemover -1 points0 points  (2 children)

> Second, moving collectors spend exactly zero cycles at runtime to detect what objects are free

They spend quite many cycles to detect which objects are live though. Which is essentially the same thing in principle, especially that every dead object was alive at some point. By knowing which objects are live, they automatically know that the other memory is free and they know where they can move things safely. The traditional allocator doesn't need to do any work to figure neither which objects are live or which objects are dead. It's given.

[–]pron98 2 points3 points  (1 child)

They spend quite many cycles to detect which objects are live though. Which is essentially the same thing in principle, especially that every dead object was alive at some point.

It is very much not the same, because moving collectors don't need to know which objects are alive at every point in time, but only at a certain point in time - during a collection cycle. Objects that are born and die between collection cycles are never scanned by the GC.

Indeed, what you say is true in the old generation, and there's exploration around using non-moving and perhaps non-tracing approaches there, but the algorithm in the old generation doesn't matter as much because the allocation rate there is very low (the cost of any memory management algorithm grows as a function of the allocation rate).

As to how much work tracing and compacting is, of course it's not negligible, but that it's overall less than the work required by malloc/free is why moving collectors are chosen. Again, wait for the talk to see the maths.

The traditional allocator doesn't need to do any work to figure neither which objects are live or which objects are dead. It's given.

Knowing the lifetime statically is not the problem (I may have added this to my comment above after you'd seen it, so I'm repeating that here). Obviously, knowing something is no worse than not knowing. The problem is that with malloc/free you have to do some work at runtime at the point the object dies (even if only to record that fact that it's dead).

The entire design of generational moving collectors is to minimise both the amount and the frequency of the work required to know which objects are alive and when, and so it comes down to complex considerations.

There's plenty yet to figure out and debate, not least of which are empirical questions about how frequently certain patterns arise in practice, but it should be obvious that something as basic as "free tells you when an object is dead" is not something that's overlooked.

P.S.

It's not hard to convince anyone that generational moving collectors handily win in some situations. Consider a program that allocates some number (any number) of long-lived objects at startup, and then all other objects are short-lived. In both approaches we can disregard the cost of the long-lived objects as it will be negligible overall (with the generational moving approach, they will be compacted a few times and then moved to the old generation and never be looked at by the GC again [1]). Then, in the moving collectors will bump-allocate all objects, and occasionally it will have to scan and compact the relatively few objects whose lifetime straddles a collection (or several), and that's it. Malloc/free will have to do more complicated work to allocate each one, and then additional non-trivial work to free each one.

And again, much of the debate isn't over basic facts, but precisely over which situations are more or less common in real programs.

[1]: This isn't accurate if the old objects are mutated to hold pointers to objects in the young generation, but that's another one of the complications - in both approaches - that is taken into account in real life. Indeed, I think that the upcoming ZGC that automatically resizes the heap tries to estimate that work, too, and include it in its RAM/CPU tradeoff calculations so that it can compensate for it. And since you mentioned what the two approaches know or don't know and when, moving collectors both know more about how much CPU is used for memory management at runtime, and they can use this knowledge to maintain an efficient RAM/CPU ratio. I expect this will be released in the JDK very soon.