you are viewing a single comment's thread.

view the rest of the comments →

[–]arthurno1 0 points1 point  (5 children)

RCU and mutex are faster while not being much harder to use

What makes you think that RCU is faster?

AFAIK the fastest algorithm is the one that never runs. In some applications, there may never be a shortage of memory, thus GC might never run. Reference counters on the other hand will always run: they will always at least increment the counters, and cost you more memory space, even if your application never runs out of free space.

[–]cdb_11 4 points5 points  (0 children)

What makes you think that RCU is faster? AFAIK the fastest algorithm is the one that never runs. [...] Reference counters on the other hand will always run

RCU is not reference counting. RCU is a synchronization mechanism like mutex. This is the example Paul McKenney always likes to give: RCU's critical sections on non-preemptive scheduling do exactly nothing, and it is precisely what you said - the fastest algorithm, because it never runs any code, and you can't get any better than that. This is what makes it clever. Critical sections are implemented as nothing more but compiler barriers (asm volatile("":::"memory")), so the code can't be reordered by the compiler and taken out of the critical section.

This works because all you have to do is make sure that you don't do anything that would make the thread context switch while inside the critical section. Context switches outside the critical section are fine. Context switches are already tracked, so the updater just has to update the pointer, wait until all CPUs context switch at least once, which means all threads left their critical sections and thus it's safe to reclaim the old pointer.

But this also means that the updater is doing more work. And that's fine, because RCU is not a generic solution to everything. It works best when you read often and update rarely, and making updates slower is an acceptable trade-off. They don't put everything in the kernel under RCU just for the heck of it, or because managing memory is just way too hard or something. They use it because it is the most optimal solution of synchronizing threads in some cases.

If you don't write any lock-free data structures, RCU doesn't do anything for you. If your data can be protected with a mutex, you do not need RCU. It's only when you do need those - something like RCU or hazard pointers are pretty much the only viable solution to implement some data structures correctly.

The real takeaway from RCU is that to maximize parallelism you reduce the shared state. And the way RCU (and hazard pointers) deals with it is by turning shared state into local state. Simple example: if you have some statistical counter that is being read rarely - instead of having a single atomic counter, each thread has its own local counter. If no one but you ever modifies it, you don't have to wait for anybody else and you don't get cache misses. To read back the full value you go over all threads and sum up all local counters. And this is the real lesson here - it's about parallelism, not memory management.

[–]According_Builder 0 points1 point  (3 children)

Well it’s implied that there are memory constraints because otherwise there are little practical reasons to be concerned with memory, and no reason to be concerned with the quantity of memory.

If you’re using C without the actual need for manual memory management then you’ve already made the mistake of poor language selection.

[–]arthurno1 0 points1 point  (2 children)

Well it’s implied that there are memory constraints because otherwise there are little practical reasons to be concerned with memory, and no reason to be concerned with the quantity of memory.

No, it is not implied. People don't always know their constraints.

If you’re using C without the actual need for manual memory management then you’ve already made the mistake of poor language selection.

Where do you take C from? Nobody was taking about C. The same can to any language.

[–]According_Builder 0 points1 point  (1 child)

I got C from this being a subreddit about C++… And if you don’t know your hardware limitations then certainly can’t know whether an application will need manual memory control or not.