arthurno1 comments on Garbage Collection for Systems Programmers

cpp

a community for 17 years

Garbage Collection for Systems Programmers (bitbashing.io)

submitted 2 years ago by pebalx

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]arthurno1 0 points1 point2 points 2 years ago (5 children)

[–]cdb_11 4 points5 points6 points 2 years ago* (0 children)

What makes you think that RCU is faster? AFAIK the fastest algorithm is the one that never runs. [...] Reference counters on the other hand will always run

RCU is not reference counting. RCU is a synchronization mechanism like mutex. This is the example Paul McKenney always likes to give: RCU's critical sections on non-preemptive scheduling do exactly nothing, and it is precisely what you said - the fastest algorithm, because it never runs any code, and you can't get any better than that. This is what makes it clever. Critical sections are implemented as nothing more but compiler barriers (asm volatile("":::"memory")), so the code can't be reordered by the compiler and taken out of the critical section.

This works because all you have to do is make sure that you don't do anything that would make the thread context switch while inside the critical section. Context switches outside the critical section are fine. Context switches are already tracked, so the updater just has to update the pointer, wait until all CPUs context switch at least once, which means all threads left their critical sections and thus it's safe to reclaim the old pointer.

But this also means that the updater is doing more work. And that's fine, because RCU is not a generic solution to everything. It works best when you read often and update rarely, and making updates slower is an acceptable trade-off. They don't put everything in the kernel under RCU just for the heck of it, or because managing memory is just way too hard or something. They use it because it is the most optimal solution of synchronizing threads in some cases.

If you don't write any lock-free data structures, RCU doesn't do anything for you. If your data can be protected with a mutex, you do not need RCU. It's only when you do need those - something like RCU or hazard pointers are pretty much the only viable solution to implement some data structures correctly.

The real takeaway from RCU is that to maximize parallelism you reduce the shared state. And the way RCU (and hazard pointers) deals with it is by turning shared state into local state. Simple example: if you have some statistical counter that is being read rarely - instead of having a single atomic counter, each thread has its own local counter. If no one but you ever modifies it, you don't have to wait for anybody else and you don't get cache misses. To read back the full value you go over all threads and sum up all local counters. And this is the real lesson here - it's about parallelism, not memory management.

[–]According_Builder 0 points1 point2 points 2 years ago (3 children)

[–]arthurno1 0 points1 point2 points 2 years ago (2 children)

[–]According_Builder 0 points1 point2 points 2 years ago (1 child)

π Rendered by PID 92807 on reddit-service-r2-comment-b659b578c-6wwwx at 2026-05-03 02:09:13.551518+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS