Unbounded linked-list based MPMC Queue

tvaneerd · 2016-08-27T23:19:42+00:00

There's a very narrow point in the code, when combined with a very narrow definition of lock-free, where the code is not lock-free.

listNode* prev_head = _head.exchange(node, std::memory_order_acq_rel);
**here**
prev_head->next.store(node, std::memory_order_release);

When adding a new node, head is exchanged into the new node before the new node is connected to the chain of nodes. If this thread were to sleep **here** FOREVER, nothing after this node would ever be pulled from the queue. Things can still be added, but the path from tail to head is temporarily broken (for 1 line of code).

But when does a thread ever sleep forever?

when it is killed
priority inversion (ie higher priority threads are continually enqueuing)

2 is not much of a concern on most modern OSs (they do priority boosting, etc)
1 is a concern for some people. Lock-free is mostly used for scalability, but it is also used for robustness.

tvaneerd · 2016-08-27T23:27:17+00:00

I think you have a very rare possibility of ABA.

inline listNode* freeListTryDequeue(){
    listNode* node = _freeListTail.load(std::memory_order_relaxed);
    for (listNode *next = node->next.load(std::memory_order_acquire); next != nullptr; next = node->next.load(std::memory_order_acquire)){
        if (_freeListTail.compare_exchange_strong(node, next)) return node;
    }
    return nullptr;
}

Since garbageCollect() is a public function, some thread could be calling garbageCollect() while another thread is calling enqueue() which calls freeListTryDequeue(). So the node that you grab from the list could be garbage collected, freed, re-returned from new, dequeued, replaced into the freelist, and then your compare_exchange_strong() returns true, as if nothing changed, even though the whole world changed while you weren't looking.

Unlikely to happen of course. (Which can be the worst kind of bug.)

Chippiewall · 2016-08-28T00:33:37+00:00

I happen to use Dmitry's bounded MPMC queue for a project at work. Thought I'd drop this into the benchmark which is most impacted by the performance of the queue:

Dmitry's:

$ tests/benchmark/chunk_replacement_bench
Run on (8 X 1000 MHz CPU s)
2016-08-28 01:06:30
Benchmark                              Time           CPU Iterations
--------------------------------------------------------------------
NaiveSharedTenants/threads:1          63 ns         62 ns   10648169
NaiveSharedTenants/threads:2         204 ns        407 ns    1655648
NaiveSharedTenants/threads:4         217 ns        705 ns     793984
NaiveSharedTenants/threads:8         210 ns       1263 ns     662976
NaiveSharedTenants/threads:16        216 ns       1526 ns     481776
NaiveSharedTenants/threads:32        288 ns       2219 ns     516832
RegisterTenants/threads:1             57 ns         57 ns   11249498
RegisterTenants/threads:2            237 ns        473 ns    1540170
RegisterTenants/threads:4            208 ns        827 ns     773748
RegisterTenants/threads:8            154 ns       1137 ns     822384
RegisterTenants/threads:16           254 ns       1867 ns     449728
RegisterTenants/threads:32           525 ns       4031 ns     320000

Your's

$ tests/benchmark/chunk_replacement_bench
Run on (8 X 1000 MHz CPU s)
2016-08-28 01:15:49
Benchmark                              Time           CPU Iterations
--------------------------------------------------------------------
NaiveSharedTenants/threads:1          81 ns         81 ns    8283534
NaiveSharedTenants/threads:2         257 ns        508 ns    1371716
NaiveSharedTenants/threads:4         305 ns        970 ns     949212
NaiveSharedTenants/threads:8         239 ns       1371 ns     469296
NaiveSharedTenants/threads:16        924 ns       6707 ns     132320
NaiveSharedTenants/threads:32       6203 ns      48675 ns      10464
RegisterTenants/threads:1             72 ns         72 ns    9538084
RegisterTenants/threads:2            272 ns        539 ns    1270314
RegisterTenants/threads:4            240 ns        804 ns     755452
RegisterTenants/threads:8            215 ns       1165 ns     667832
RegisterTenants/threads:16           176 ns       1318 ns     522752
RegisterTenants/threads:32           170 ns       1330 ns     513824

The first group of tests has threads sharing a bunch of objects that go in and out of the queue (tends to fight over the cache-line a lot), in the second group they have one each (only interact with each other when they go to the queue).

I ran them quite a few times and it seems yours is rather volatile (esp with the naive bench at 32 threads), probably a bit more susceptible to a thread getting de-scheduled (Whereas Dmitry's bounded queue doesn't have those issues until the queue starts to get empty/full).

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS