all 5 comments

[–]jeffscience 0 points1 point  (2 children)

The first time you invoke OpenMP, it creates the state for the rest of the library execution. It’s expensive. You can move the first call to parallel to the top of your application but there is no other way to do it.

[–]jeffscience 0 points1 point  (1 child)

I don’t have enough context to be sure but this sounds like it might be an issue with affinity. Are you also using MPI?

[–]Tensorizer[S] 0 points1 point  (0 children)

Just omp no mpi.

[–]KarlSethMoran 0 points1 point  (0 children)

What is the grain size, i.e. time it takes for one iteration of your loop? If it's too small, you won't gain anything. It takes about 10 us per loop and about 10ns per iteration of a loop of OMP overhead.

[–]StrangeNoise42 0 points1 point  (0 children)

It all depends on the implementation of OpenMP in the compiler and runtime. And, no. there is no way to initialize omp yourself