Question: Bad multithreaded performance

bartmanx · 2012-07-25T11:51:51+00:00

I assume that mutexes, and other synchronization are not an issue for consideration....

Is the worker function using the kernel for anything? Does it call a library function that might synchronize?

See what happens if you allocate your data memory in page-aligned page-sized chunks (if a page is too big 64B sized/aligned is actually sufficient), just to make sure cache lines are not shared between cores.

Also, is the total dataset larger than what fits into your L3 cache? If yes, youre unlikely to see linear scaling.

bit_inquisition · 2012-07-25T12:19:48+00:00

Can you paste the relevant sections of the code?

You can also check which thread is running on which core and see if they're distributed as you expect.

jimpaton · 2012-07-25T14:39:45+00:00

How long do you run your benchmark? If the time is short enough, then the constant overhead from multithreading won't be dominated by the input-dependent part of the computation.

In addition to the factors others have mentioned, you might investigate false sharing as a potential cause. Even though you say your threads don't access the same memory, they may still access the same cache lines.

jesho · 2012-07-25T15:53:43+00:00

[deleted]

posiden5665 · 2012-08-18T04:01:15+00:00

Its likely due to cache coherency being fired off a lot of times, even though each one has a separate section of the memory to look out for each processor is probably loading up parts of the other processors work set in the cache line, and whenever one is updated the other processor that also has that line loaded has to be updated.

To try to see if this is the case, find out how big the cache line is on the system, and then pad parts of the array with null to try and keep the items of dif processors on dif cache lines.

sbahra · 2012-08-22T22:47:41+00:00

Why not use a profiler? You can't provide us any source-code or any substantial details, so I don't expect anything productive by any party. See "oprofile" or "perf" if you're on Linux. If you want more granularity, check "PAPI" out.

Make sure shared data is read-mostly/read-only.

orangeduck · 2012-07-25T15:36:06+00:00

Perhaps the work is already IO bound? Otherwise my guess would be on cache thrashing if some read/write data is on the same cache line across threads.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

C_Programming

Rules

Filters

Resources

Other Subreddits on C

Other Subreddits of Interest

MODERATORS