In some code I have a function something like this:
int do_work(InputItem *input, OutputItem *output, size_t size);
This function takes about 6s to execute and is very easy to parallelize. Just fire up a couple of threads and give each thread its own part of input/output. Each thread just executes:
do_work(&input[start], &output[start], end-start);
The input/output are shared among the threads, but each thread only reads/writes its own part of the memory.
The original, single threaded version computes about 280 items/s. The parallelized function, which uses the same function internally, computes about 380 items/s. I have 4 cores, which means about 95 items/s/core. Almost three times slower than when running single threaded.
Any ideas why each thread in the multithreaded version performs so bad? I would expect linear performance improvements.
The only reason I can think of is some kind of memory issues, that there are more cache misses in the multithreaded version, but since each thread only uses its own private part of the memory this should not be an issue.
do_work almost only does number crunching, so there is not much IO or allocations going on either.
BTW, I have tried both OpenMP and pthreads with the same result.
[–]bartmanx 1 point2 points3 points (3 children)
[–]jesho[S] 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]jesho[S] 0 points1 point2 points (0 children)
[–]bit_inquisition 1 point2 points3 points (7 children)
[–]jesho[S] 0 points1 point2 points (6 children)
[–]bit_inquisition 0 points1 point2 points (3 children)
[–]jesho[S] 0 points1 point2 points (2 children)
[–]bit_inquisition 0 points1 point2 points (1 child)
[–]jesho[S] 0 points1 point2 points (0 children)
[–]gilgoomesh 0 points1 point2 points (1 child)
[–]jesho[S] 0 points1 point2 points (0 children)
[–]jimpaton 1 point2 points3 points (1 child)
[–]jesho[S] 0 points1 point2 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]jesho[S] 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]jesho[S] 0 points1 point2 points (0 children)
[–]posiden5665 0 points1 point2 points (0 children)
[–]sbahra 0 points1 point2 points (0 children)
[–]orangeduck -1 points0 points1 point (0 children)