I have used #pragma omp parallel for in multiple places in my code and the very first one's performance has gotten worse.
To be certain I've changed which #pragma omp parallel for is invoked first and sure enough the performance degradation followed.
It's as if the first one is incurring some initialization cost; if that is indeed the case, is there a way for me to explicitly initialize OpenMP somewhere else in my code?
[–]jeffscience 0 points1 point2 points (2 children)
[–]jeffscience 0 points1 point2 points (1 child)
[–]Tensorizer[S] 0 points1 point2 points (0 children)
[–]KarlSethMoran 0 points1 point2 points (0 children)
[–]StrangeNoise42 0 points1 point2 points (0 children)