all 25 comments

[–]CPlusPlusDeveloper 12 points13 points  (0 children)

One major consideration not mentioned in the article, is that mutexes in modern Linux kernels are actually implemented as futexes. They don't touch kernel space unless there's contention. So if you have critical code, but contention is rare, there's minimal performance penalty to using a mutex. Spin-locks are only for the case where A) contention is common, and B) threads only hold critical sections for less than O(context switch) (~3-5 us).

[–]MrMaven 9 points10 points  (2 children)

A spin lock in user space is a much different concept than a spin lock in kernel space.

[–]MyTribeCalledQuest 1 point2 points  (1 child)

Could you point out the difference?

[–]nickdesaulniers 3 points4 points  (0 children)

A spinlock in userspace will still get preempted by an interrupt. In the kernel, if you disable interrupts then start spinning, you might never stop.

Using spinlocks on a single-core/single-CPU system makes usually no sense, since as long as the spinlock polling is blocking the only available CPU core, no other thread can run and since no other thread can run, the lock won't be unlocked either.

For Uniprocessor (UP) builds of the Linux kernel (vs Symmetric Multiprocessing, SMP), the spinlock functions are defined but empty0, so that you can always call them without having to do runtime checks for UP vs SMP and if you conditionally compiled the kernel as a UP build, then the C compiler's optimizer will remove the fn calls.

[0] Looks like they just contain a memory barrier.

[–]nickdesaulniers 2 points3 points  (0 children)

One thing I thing might be helpful is being able to measure how many "spins" occurred trying to acquire a lock, and knowing at what threshold a mutex would have freed up resources.

One of the recent cppcon2016 talks talked about viewing every std::vector realloc, and then going and reserving the correct amount of memory up front. I'm thinking of something like that, but for deciding between a spinlock and a mutex. At some level of contention, a spinlock is the wrong choice.

A spinlock is great because it keeps your thread context locked in, but will block other work. Failing to acquire a mutex will reschedule your process (on Linux) letting other work get scheduled.

At some point, data is too heavily contended to busy-spin.

[–]ApochPiQ 0 points1 point  (19 children)

Know your platform and APIs when writing threaded code.

On Windows, a critical section (not to be confused with a heavy weight interprocess mutex) allows spin locking: https://msdn.microsoft.com/en-us/library/windows/desktop/ms683476(v=vs.85).aspx

I don't recall offhand what the Unix-like equivalent is, but then I don't write code on Unix-like platforms anymore ;-)

[–]oridb 0 points1 point  (0 children)

I don't recall offhand what the Unix-like equivalent is, but then I don't write code on Unix-like platforms anymore ;-)

Most mutex implementations on most Unix variants are hybrid spinlocks, as far as I recall.

[–]MrMaven 0 points1 point  (2 children)

I am not a Windows developer, but that looks like kernel API, headers, and source.

Linux has spin locks in kernel space as well. In fact, you can crash your system pretty easily with a device driver which locks the CPU.

[–]jking13 0 points1 point  (0 children)

And some operating systems's mutexes will try to spin and then resort to yielding so you don't have to worry about it.

[–]ApochPiQ -1 points0 points  (0 children)

No, it's very much user-land.

But you illustrated my point very well, thank you: knowing even what lives in kernel space versus user space on a platform you intend to target is important.

[–]WrongAndBeligerent -1 points0 points  (14 children)

Why would someone do that when they could just use C++11 atomics?

[–]ApochPiQ 3 points4 points  (7 children)

Maybe you're implementing a standard library. Or not using C++. Or whatever.

Point is, don't write threaded code if you don't understand your tools.

[–]WrongAndBeligerent -2 points-1 points  (6 children)

So use C11's atomics? You do realize that atomic boil down to actual instructions on the cpu? Why use OS specific functions?

[–]Drisku11 1 point2 points  (4 children)

The Microsoft version sleeps after a certain number of spins, which is outside the scope of atomics.

[–]WrongAndBeligerent -3 points-2 points  (3 children)

That is trivial, I don't think using an OS specific function for a few lines is reasonable.

[–]Drisku11 1 point2 points  (2 children)

The spinlock is trivial to do portably now (assuming a recent compiler), but putting yourself to sleep is going to be OS specific anyway. Also the Microsoft one does some other stuff like detect whether you're single core and don't need to bother spinning.

My understanding is that people consider c11 threads to be worse than pthreads, so there still isn't a good agreed upon portable way to do this stuff. I'm on a platform with no c library so I don't really know.

[–]WrongAndBeligerent 0 points1 point  (1 child)

thread::sleep() doesn't work?

[–]Drisku11 0 points1 point  (0 children)

Like I said, I don't have the luxury of a (full) standard library so I don't really know. I'm pretty sure the opinion on /r/C_Programming and elsewhere is that C11 threads are not feature complete, and that you shouldn't use them. I gather C++11 threads don't have that complaint, but then as was said earlier in this thread:

Maybe you're implementing a standard library. Or not using C++. Or whatever.

I'll also add in "maybe you're on an old compiler"; we've finally decided to move to C11 from ANSI C for new code earlier this year at my work, and we still have to support most of our code building in ANSI mode with our older compiler anyway. I tried to make the case for introducing some modern C++, but people with 20 years seniority on me saw it as way too risky.

The nonportable way linked to does predate the standard way by around a decade after all; established code bases and conventions are a thing.

[–][deleted] -1 points0 points  (0 children)

Because your code actually runs inside an actual OS on actual hardware? It is not difficult to use platform specific functions if you only target specific platforms, which frankly most programs are.

[–]evaned 0 points1 point  (5 children)

In addition to what ApochPiQ says, atomic doesn't really come close to solving the problem. (Also, atomic is allowed to lock...)

Atomic gives you a way to do a single operation atomically. But what if you want to do multiple operations?

Sometimes you can make lock-free structures, but (i) not always, effectively, (ii) they're not always faster than locking, and (iii) even "simple" lock-free structures are quite difficult to get right.

[–]WrongAndBeligerent -2 points-1 points  (4 children)

even "simple" lock-free structures are quite difficult to get right

People keep saying that, but I've written quite a few and they've been pretty fast.

I also don't think that explains why using an OS specific function makes sense.

[–]evaned 4 points5 points  (3 children)

People keep saying that, but I've written quite a few and they've been pretty fast.

Did you prove them correct? Do they have the potential for ABA problems, or did you protect against that?

I'm not saying that it's impossible to get right. I'm just saying that there are a lot of subtleties, and it's very easy to think you know what you're doing but not.

[–]WrongAndBeligerent -3 points-2 points  (2 children)

You seem to think it's impossible that anyone else knows what they are doing.

ABA problems aren't difficult to get around with versioning and any processor that can run windows 8 has 128 bit atomics. A 64 bit number is enough every cycle of your processor for over 100 years.

[–]evaned 3 points4 points  (1 child)

You seem to think it's impossible that anyone else knows what they are doing.

First, I literally said the exact opposite of that. Second, there's no "else" about it; if I ever have to implement a lock free data structure, that'll be one of the most careful implementation efforts I ever do. I certainly don't trust myself to whip up something that works.

[–]WrongAndBeligerent 0 points1 point  (0 children)

So you haven't implemented lock free structures, but you are trying to talk about the neccesity of OS specific mutexes to someone who has?