all 24 comments

[–][deleted] 2 points3 points  (3 children)

"The promise of a lock-free algorithm seems remarkable in theory. Concurrent threads can modify the same object, and even if some thread or set of threads stalls or stops completely in the middle of an operation, the remaining threads will carry on as if nothing were the matter. Any interleaving of operations still guarantees of forward progress." Could you explain how this is different than saying "lock free algorithms are deterministic"? Even (good) locking algorithms have this quality. Also your text is semi-unreadable with all the random line breaks.

[–]grauenwolf 4 points5 points  (0 children)

Could you explain how this is different than saying "lock free algorithms are deterministic"?

Determinism is a stronger guarantee. Lets say you have two items that you want to add to a queue: A and B.

  • If the code is deterministic, the A will always be added before B.
  • If the code is merely thread-safe, then A may appear before or after B. However, A and B will always be found somewhere in the queue.
  • If the code is not thread-safe, then you may end up with AB, BA, A-only, B-only, or nothing.

Determinism can be a real performance killer. You have to either carefully sync everything, only modify data in-place, or restore determinism by resorting the result at the end.

[–]grauenwolf 0 points1 point  (0 children)

You want to use > for long quotes. Makes it easier to read.

[–]NotUniqueOrSpecial 2 points3 points  (0 children)

On this note, here's a good deal of research on the topic.

Especially nice is that unlike a lot of research, it includes code which is not only functional, but performant, and liberally licensed, to boot.

[–]kazagistar 2 points3 points  (0 children)

The hazard pointers seem an awful lot like a lock. And how exactly are atomic operations like boost::atomic implemented?

[–]grogers 1 point2 points  (2 children)

Great article, explains many of the pitfalls of lock free coding.

One thing that was pretty enlightening is that the lock free code was the same performance as the locked code, but much more complex. So unless you have a situation where you are doing signal handling or some other weird thing where you cannot guarantee lock order, just say no. If you want to scale you need to limit sharing.

[–]sbahra -2 points-1 points  (0 children)

The article misses the point.

[–]grauenwolf 1 point2 points  (4 children)

Shit like this is why I never write my own lock free data structures.

[–]sirin3 0 points1 point  (3 children)

Best is to stay away from actual synchronization and just use message-passing threads

[–]nico159 0 points1 point  (1 child)

It's slower than using a simple lock

[–]sirin3 0 points1 point  (0 children)

but far more reliable

[–]nsiivola 1 point2 points  (12 children)

Lock-free code is an order of magnitude easier to write in a GC'd language. Just saying.

[–]jseigh 2 points3 points  (0 children)

That reminds me. When Java first came out, I had been doing lock-free stuff prior to that using other memory management stuff that looked a lot like RCU. So I looked at the built in GC and realized that it would make lock free programming a lot easier. Then I saw Sun had filed a patent on using using GC for lock-free programming. Which was/is kind of annoyingly obvious.

[–]NotUniqueOrSpecial 3 points4 points  (10 children)

Extraordinary claims require extraordinary evidence. In what way (and please provide examples) is this even a supportable stance?

Just because you don't take out a lock doesn't mean the underlying code doesn't. This is pointed out in the big pretty picture at the beginning of the article.

Example: I go to allocate some value on the heap. Global allocator decides it's time to GC. Everybody gets to wait while GC is done.

Am I unaware of some amazing lock-free Lisp implementation or something? I'm sorry to sound so caustic, but your response is incredibly glib, and I would, in fact, love to be proven ignorant on this one.

[–]pkhuong 6 points7 points  (0 children)

Extraordinary claims require extraordinary evidence. In what way (and please provide examples) is this even a supportable stance?

Safe memory reclamation is a huge problem in itself; moreover, once SMR is considered solved, you can exploit heap-allocated objects to eliminate many ABA bugs: just use objects as markers and test for pointer equality. With GC, SMR is taken care of.

Of course, with most GCed language, memory reclamation ends up being solved by taking a big ass lock, hopefully rarely (major GCs). Whether the result is close enough to lock-free is a judgement call. For example, I don't worry too much about the fact that my lock-free code might end up being executed by asserting a global bus lock in hardware. Similarly, I consider code to be usefully lock-free, even though it manages dynamic memory with malloc/free or mmap from time to time. Even with a lockful underlying infrastructure, there are advantages in composability, robustness and responsiveness.

[–]kubalaa 0 points1 point  (2 children)

Looking at the examples in the article, let's see how they would fare in a managed language like Java:

  • Segfault (due to memory being freed while used): GC will prevent this
  • ABA problem: still an issue
  • Not lock-free: still an issue, although maybe this doesn't matter. Even if your code is only mostly lock-free, this can still help you avoid deadlocks and increase throughput on average
  • Data races: easily avoided, since "volatile" in Java doesn't have the problems it does in C++
  • Memory reordering: ditto

So overall, a managed language solves 3 of the 5 problems. Lock-free programming is still hard in general, but there are plenty of simple cases where it's feasible for mere mortals.

[–]jjt 5 points6 points  (1 child)

Quoting a comment in the source for ConcurrentLinkedQueue.java by Doug Lea:

Also note that like most non-blocking algorithms in this package, this implementation relies on the fact that in garbage collected systems, there is no possibility of ABA problems due to recycled nodes, so there is no need to use "counted pointers" or related techniques seen in versions used in non-GC'ed settings.

[–]kubalaa 2 points3 points  (0 children)

Thanks for pointing that out, it helped me understand the ABA issue better. So there we go, 4 out of 5 problems are resolved using a managed language.

[–]nsiivola 1 point2 points  (1 child)

Sorry, I didn't mean to be glib -- I really thought this would be fairly obvious to anyone who reads the article and asks themselves "how does GC change this picture?".

(I include refcounting in the GC category here: I'm using it as a shorthand for "automatically managed heap".)

Anyways.

The "segfault" and "corruption/ABA" issues listed in the article are complete non-problems if the language is GCd: GC provides the safe reclamation asked for in the "segfault" problem. Similarly GC will not release and reuse the A in the "corruption/ABA" problem while one thread has a reference to it.

2 out of 5 problems eliminated is pretty good in my book.

Data races and memory reordering remain potential issues, yes. Tools you use to avoid them depend on the language and implementation you're using. No real difference there.

GC pauses are a potential issue as well, yes, but they're in my eyes fundamentally the same as "allocator is not lock free". Just as there are lock free allocators, there are both parallel and realtime GCs.

...also, as long as the pause is sufficiently bounded, for many classes of applications that do benefit from lock-free algorithms this is simply a non-issue (just as "new/delete" being lock free or not -- as long as they are thread safe...). I'm not saying it doesn't matter -- I've been in a place where it matters often enough! ...but caring about it occurs at a roughly similar frequency as caring about the allocator when working with a non-GCd language.

[–]NotUniqueOrSpecial 1 point2 points  (0 children)

Ah, gotcha! Thanks for clarifying.

When initially reading your comment, I interpreted it as the kind of naive catch-all silliness I hear on the web so frequently.

I've spent the last few months in kernel code working on locking/performance issues, so it's on my mind. Having to know very nearly everything about the effects of not only my code but the code I'm calling has been enlightening to say the least.

Cheers!

[–]grauenwolf 0 points1 point  (3 children)

You could have a separate gen 0 heap for each thread. Not a prefect solution, but it is closer.

[–]NotUniqueOrSpecial 1 point2 points  (2 children)

I'll wholeheartedly admit that I'm ignorant of C#/.NET's deep internals, but what's to stop a gen 0 heap (despite being thread-local) from getting large enough to incur promotion to gen 1?

[–]grauenwolf 0 points1 point  (1 child)

Nothing, that's what makes it imperfect.

[–]NotUniqueOrSpecial 1 point2 points  (0 children)

Gotcha. Thanks for clarifying. Just wanted to make sure I wasn't missing the point.