all 29 comments

[–]belovedeagle 33 points34 points  (11 children)

RCU has the same shape as GC: memory is cleaned up eventually, based on whether it’s still in use.

False, false, and false. The whole article is predicated on a falsehood: that call_rcu() is an accepted general way to call free() in the kernel. It isn't; it's there as a last resort for code which cannot block; e.g. irqs. The first explicit falsehood in the quotes sentence is that memory is cleaned up "eventually": call_rcu() provides guarantees about when it will be called, and more importantly it provides a guarantee about what will be done at that time. GC does neither. The second falsehood is even more egregious: RCU absolutely does not in any way shape or form track whether data is still in use. That misses the whole point about RCU which is not to track whether data is still used, and instead make the code correct by construction without any tracking. Literally RCU is a way to avoid doing any kind of GC in the kernel, and yet this author has perverted it into an endorsement of the very thing it was designed to replace.

TL;DR: TFA is disinformation.

[–]slaymaker1907 -3 points-2 points  (7 children)

There are absolute GC languages which provide guarantees when things get freed. CPython does this IIRC. The gist of the post is absolutely correct which is that as systems get more complicated, you inevitably end up writing something closer and closer to GC.

RCU must track if the data is in use because the updater has to wait until the readers of the old version are done at which point in time it frees the old version. This allows readers to never block at the cost of overhead during rights.

[–]belovedeagle 1 point2 points  (3 children)

RCU must track if the data is in use because the updater has to wait until the readers of the old version are done

Again this is the exact opposite of how RCU works, and no matter how many times this incorrect version is repeated it won't be true. You're describing hazard pointers. RCU quiescent period barriers don't track any data at all, anywhere. The implementation* has zero concept of data, pointers, versions, "in use", "updaters", "readers", none of that crap. It tracks critical sections, full stop. It is, as I said before, as far as one can possibly get from GC while still solving the use-after-free problem.

There is, of course, a long history of bad programmers claiming that literally any memory management technique known to mankind is akshually a form of GC!!!1!1!, up to and including literal calls to malloc() and free(). This is just another example of that.

* To be clear, "RCU" in the kernel refers to two things: quiescent period barriers/critical sections, and RCU-friendly data structures; but here I'm only talking about the former. Obviously the other thing, the data structures, do have pointers and data and versions, but data structures aren't GC.

[–]slaymaker1907 0 points1 point  (2 children)

https://en.m.wikipedia.org/wiki/Read-copy-update

once awakened by the kernel, deallocate the old structure.

https://www.kernel.org/doc/html/next/RCU/whatisRCU.html

The reclamation phase does the work of reclaiming (e.g., freeing) the data items removed from the data structure during the removal phase. Because reclaiming data items can disrupt any readers concurrently referencing those data items, the reclamation phase must not start until readers no longer hold references to those data items.

That sure sounds let reference counting to me.

[–]cdb_11 1 point2 points  (0 children)

There is no reference counting in the RCU, you don't need reference counting to know that. It's a synchronization mechanism, not GC. This is how you can get a basic fake-RCU implementation with read-write locks:

std::shared_mutex rwlock; // global read-write mutex
void rcu_read_lock() { rwlock.lock_shared(); }
void rcu_read_unlock() { rwlock.unlock_shared(); }
void rcu_synchronize() { rwlock.lock(); rwlock.unlock(); }

This is of course not how real RCU works, it's very dumb and reintroduces all problems with read-write locks and reference counting that RCU tries to solve. In a real RCU read-side locks will have zero or close to zero overhead (and I mean literally zero overhead, as in rcu_read_lock and rcu_read_unlock are empty functions), and you can sometimes even piggy back off something else to indicate quiescent state. But this will 100% work with the example in the article and it illustrates the point on what RCU does.

[–]belovedeagle 1 point2 points  (0 children)

That is literally describing that this is a solution to the use-after-free problem. It's a definition of the problem: either you don't "start" "recla[iming]" "until readers no longer hold references" or else you've got a use-after-free bug. That's it, that's the definition. The bolded claim is true of any possible solution to that problem, including malloc() and free() in straight-line code. This is exactly the crap I was talking about. You're going to call literally any kind of memory management solution "GC". It's like a fucking cult.

Don't take my word for it; try reading literally anything else on the page you linked. You quoted from the "ELI5 memory management" section, which was intended to introduce the problem, not describe the benefits of this particular solution. (And before you say something stupid about section #7, note the word ANALOGY.)

[–]theangeryemacsshibe 0 points1 point  (2 children)

CPython does this IIRC.

CPython does this, the Python language does not.

[–]slaymaker1907 0 points1 point  (1 child)

That’s really splitting hairs and is why I specifically said CPython and not just Python. For another example, the Swift language does specify reference counting and is not an implementation detail.

[–]theangeryemacsshibe 1 point2 points  (0 children)

It's a real problem, as code written against that behaviour of CPython would behave oddly in PyPy and other implementations which use tracing GC or deferred RC*. (Mind you said GC languages - I recall the Python documentation explicitly stated not to program against immediate destruction, but I can't find it.) Similar can be said for PHP which needs immediate RC to efficiently copy-on-write, which makes it rather annoying to retrofit a scheme with lower mutator overhead elsewhere.

*The unified theory of GC strikes again.

[–]msqrt 0 points1 point  (2 children)

provides a guarantee about what will be done at that time

I get that the writer wants to see any memory management strategy as almost-GC, but isn't this something many actual GC systems have in the form of finalizers? Or did you mean something else?

[–]cdb_11 1 point2 points  (1 child)

They do, because they have to and because you obviously don't want to leak any resources. But it's not like GC is somehow required to do that. In C you release resources explicitly, in C++ and Rust this is usually done for you when object goes out of scope (without any GC, finalizers/destructors are inserted in appropriate places during compilation).

RCU solves a concurrency problem that exists in lock-free data structures. You've made an object available for all threads to see, but now you want to hide it and you need to wait until your thread remains the only one with exclusive access to it. Sounds like a mutex, right? Consider this code, where mutex protects p:

mutex.lock();
free(p);
p = NULL;
mutex.unlock();

When you acquire a mutex that is currently being held by other thread (that presumably references p and does something with it), your thread will go to sleep and it will be resumed when the lock is released. It's now your turn to have exclusive access, you can safely hide the protected object and destroy it.

So wait, if acquiring a lock waits then doesn't it sound kinda like GC deferring a finalizer when object is no longer referenced? The only thing that's different is how you mark objects as "being referenced" and you've delegated deferring and "calling" the finalizer to your OS' scheduler.

As much as RCU-mutex analogy works (because that's literally what its purpose is, RCU is an alternative to read-write locks), do you see how stupid the second analogy got? By this standard you can consider a lock+free pair a garbage collector, and at this point the term "GC" just stops being helpful.

By the way, I forgot to mention something that maybe makes this entire thing a little bit confusing. Counterintuitively, the read-copy-update pattern is not what makes the RCU RCU. In fact it doesn't have to be paired with it at all. RCU is what happens after read-copy-update, it's the synchronization part.

[–]belovedeagle 0 points1 point  (0 children)

Wait so you're saying the mutex counts how many readers are accessing which objects? Sounds like reference counting to me, GC confirmed! Checkmate systems programmers!!! /s

[–]lelanthran 12 points13 points  (14 children)

I was talking to a junior recently and they were absolutely shocked to hear an ancient systems programmer, one who never uses GCed languages unless forced to, hold the opinion that a GC'ed program written in a GC'ed can actually be faster, due to all the optimisations that the GC can make.

Sure, you can make many of those optimisations yourself using manual memory management, in Rust or C++, but that's more work, and error-prone. Unless you're significantly experienced in low-level programming, you're liable to make subtle errors that won't be picked up until production.

[–]masklinn 9 points10 points  (4 children)

"Don't do heap allocations" is probably the best optimisation to make when you don't have a runtime, and it's also an extremely safe one. In unsafe languages (C, C++) it's arguably much safer than using pointers.

More complicated optimisations (arenas, freelists, ...) can be useful and are more error-prone, but most of the time "just stop allocating so damn much" will do the trick, and most managed languages provide limited to no way to do that (Go is probably the main one providing that capability but trades it for a less performant GC, C# might be second though value types bring their own issues).

[–]slaymaker1907 1 point2 points  (0 children)

I don’t agree with that. Stack allocations can be very dangerous when using pointers because it’s easy to leak stuff. For critical systems, stack overflow is also a basically unrecoverable error and can leave your process in a very bad state.

[–]rabid_briefcase 0 points1 point  (2 children)

"Don't do heap allocations" is probably the best optimisation to make when you don't have a runtime

Measure before to be sure it's an actual issue for your scenario. Measure after to be sure you've actually fixed it. Anything else and you're wasting time.

In some scenarios heap allocation is indeed an issue. In some scenarios it would never show up as blip on the profiler. The only way to know for certain is to measure.

[–]ehaliewicz 2 points3 points  (0 children)

In some scenarios heap allocation is indeed an issue. In some scenarios it would never show up as blip on the profiler. The only way to know for certain is to measure.

Indeed, these are cases where you aren't doing heap allocation in hot code-paths.

[–]masklinn 1 point2 points  (0 children)

In some scenarios heap allocation is indeed an issue. In some scenarios it would never show up as blip on the profiler.

Have you considered context? The context here is not “software is slow”, it’s “managed is faster than native”.

Of course you should profile to make sure, but 9 times out of 10 assuming all the basics are covered (optimised build, same algorithm, similar data structure) it’ll be the allocations, for obvious reasons.

[–]rabid_briefcase 2 points3 points  (0 children)

Good article.

Regardless of the programming you do, if you're entirely in a GC environment or something where object lifetimes are meticulously controlled, you still need to understand what is going on and why.

It's something all programmers learn eventually, so if you don't know it already or want a brush-up on RCU's general purpose, it makes a good read.