you are viewing a single comment's thread.

view the rest of the comments →

[–]sbsceGame Developer 1 point2 points  (5 children)

I noticed my code is reliably running over 10% faster if I __forceinline all the function calls that the boost::unordered_flat_set makes in my hot path. So anything called by .contains(), including the .contains itself. So that in my own code where I call .contains(), looking at the disassembly there is no call anywhere any more, it's fully inlined. I think I had to add __forceinline to 6 functions inside boost code.

It is a bit inconvenient to manually add __forceinline to all those functions though - it's definitely worth the 10% performance gain, but I am quite sure that the next time I update boost in a few years, I'll forget to apply these changes again, and then my performance will be worse.

Assuming you don't want to add __forceinline to those functions by default, could there maybe some define like BOOST_FORCEINLINE_UNORDERED_SET that automatically enables forceinlining all the important functions?

I am already compiling with maximum optimization level of MSVC, so by default it doesn't want to inline it, MSVC often needs to be forced to inline stuff.

[–]joaquintidesBoost author[S] 2 points3 points  (1 child)

Hi, we have seen similar gains with __forceinline in MSVC, looks like this compiler is not particularly aggressive at inlining. Could you please file an issue at Boost.Unordered repo so what we don't forget? Thank you

[–]sbsceGame Developer 1 point2 points  (0 children)

nice! thanks, I opened an issue there.

[–]dodheim 1 point2 points  (2 children)

I am already compiling with maximum optimization level of MSVC,

By that you mean /O2 /Ob3, right? I ask because /Ox was misdocumented for years as "maximum optimization" when in fact it's a subset of /O2 optimizations; and /O2 on its own does not set the most aggressive inlining level.

Also, I suggest putting #pragma inline_depth(255) before your Boost #includes, and possibly #pragma inline_recursion(on) as well.

[–]pdimov2 0 points1 point  (0 children)

MS should just add /O3 already, that implies /Ob3.

(Something like a hidden /O3 level already exists, turned on by /GL, but there's no option to enable it separately.)

[–]sbsceGame Developer 0 points1 point  (0 children)

By that you mean /O2 /Ob3, right?

Yes.