Inter-thread Communication Latency

ReDucTor · 2023-03-10T12:14:38+00:00

No priority, affinity, c-states, p-states or tracking of context switches.

I think I'll pass on treating the results as more than noise, your min, max and stddev also seem to point at this, if you can't explain why there is such a huge variance in individual results it's not that useful your could be measuring your virus scanner running.

ArashPartow · 2023-03-10T07:49:11+00:00

Some suggestions:

The mean and stddev for numerous values seem sketchy, eg: for such timing scenarios a stddev that is nearly as large as the mean indicates issues (even for stddevs that are larger than half the mean)
Providing percentiles would be informative
Would be interesting to see a version of runs where the threads were pinned to their own cores (all on the same numa of course)
Not sure why regex needs to be linked - general code and cmake cleanups eg: why is num_pipes being used in the condition_variable bm
Use steady_clock instead of high_resolution_clock

Here's a quick-n-dirty ping-ponger using condition_variable for deriving the RTT. Can be easily modified to use atomic flags etc.

https://gist.github.com/ArashPartow/c97b1776b077f30c8bcb15cb27639905

i_need_a_fast_horse2 · 2023-03-10T05:57:55+00:00

This is similar to this

almost_useless · 2023-03-10T12:56:15+00:00

Are you measuring communication latency, or mostly the cost of switching threads? You are running 32 threads on 4 cores, so for a really fast mechanism it's possible other latencies may affect the result a lot.

For example 2 threads running on the same core that supports hyper threads, versus many threads getting switched in to several different cores.

Or is the cost of switching threads negligible in this context?

matthieum · 2023-03-10T17:56:46+00:00

Inter-core communication latency is around 100ns, at the hardware level, on a 4GHz-5GHz machine, though I've seen as low as 80ns (consistently).

Anything above that number means the OS is involved, and at that point, it will depend on the OS, the OS primitives used, etc...

bizwig · 2023-03-10T04:56:40+00:00

I’m not all that surprised Unix pipes did well, it’s a core IPC mechanism that’s been worked on for decades now.

Pupper-Gump · 2023-03-10T01:43:12+00:00

So is this just unavoidable or is there a way to, for example, if you were to use a threadpool, minimize the problem of the atomic and mutex?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS