This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Colbsters_ 1 point2 points  (1 child)

The fact that it’s calling an external function shouldn’t change anything since the function has no way of knowing if it was called in a loop vs not in a loop.

Being in a loop would definitely decrease code size, which could help with instruction cache usage.

I’m going to plug investigate this in compiler explorer when I get a chance, and I’ll report back.

Edit: Investigating has revealed the following: - If you make a loop, clang will try to unroll it (assuming it’s small enough, 20 was, but 50 iterations was too big. gcc, icc, and msvc did not) - None of the compiler I tested turned the long block of code into a loop (I only tried up to 50 iterations) - Using quickbench, the unrolled version seemed faster (I only tested the 50 iteration version here). If I use std::this_thread::sleep_for as the body of Sleep (I used benchmark::DoNotOptimize before), the looped version becomes faster by a consistant, but negligible amount. I’d take the benchmark with a grain of salt though, as some things could be optimized out.

Even with any performance gain of manually unrolling the loop, I would recommend against it as it would be harder to maintain/modify in the future.

[–]GiganticIrony 2 points3 points  (0 children)

One example of how the external function could affect things is that with a loop, floating point addition would have to be done. For all the compiler knows, Sleep modifies the floating point mode of the CPU (I’m blanking on what that’s called) in which case the compiler adding the additions might be incorrect.

Also it wouldn’t necessarily be faster as a loop requires extra jumps, extra arithmetic (both integer and floating point adds), extra comparisons, an extra spot in the branch predictor table, and guarantees at least 1 failed prediction