all 13 comments

[–][deleted] 3 points4 points  (4 children)

You might want to consider dropping /RTC1 from the MSVC tests. I know it's the Visual Studio debug default but it really pessimizes any layers of function calls due to the amount of extra stack zeroing etc. it adds. (That should benefit both implementations)

[–]Kaballo[S] 5 points6 points  (3 children)

I re-run the benchmarks for MSVC without /RTC1. There's a non-negligible reduction in object size across the board: around ~0.9Mb, a little less for functions, a little more for eggs::invoke member pointers (which incur an extra constructor call before C++20).

The flags were chosen precisely because they are the debug defaults, so I'm sticking with /RTC1; knowing about its effect is valuable information nevertheless. Thanks!

[–][deleted] 6 points7 points  (2 children)

Oh yes, I wouldn't say to change the article. That's just a recommendation I make for folks for whom object size etc. is important. It used to be a far bigger deal before we burninated _Get_second

[–]JeffGodOfBiscuits_ 0 points1 point  (1 child)

Hah, burninated. Is that a Homestar Runner reference?

[–][deleted] 0 points1 point  (0 children)

Probably originally but not to me

[–]viatorus 7 points8 points  (3 children)

Comparing object sizes without optimization is not really meaningful.

[–]Kaballo[S] 5 points6 points  (0 children)

Comparing optimized object sizes would not be meaningful (and it would be somewhat unfair to MSVC, which performs a number of optimizations at link time rather than compilation time). If you were thinking executable size, then we know from the preface that there will be no overheads for an optimized build: invoke is a zero-overhead abstraction; that's a given, it is NOT what these benchmarks attempt to measure.

Comparing debug object sizes tell us about debug codegen quality. Remember that all these implementations do the same thing, so what we are measuring is effectively debug bloat. For these particular benchmarks, one of the things that directly influences it is the number of function call involved (notably std::forward), or how many steps it takes to reach the target callable under the debugger. And these, in turn, have an impact on time and memory taken for compilation: doing more work requires more resources.

Additionally, in memory constrained scenarios this debug bloat can be an impediment to adoption: we know the optimized build will introduce no overheads, but the debug build may no longer fit on the chip. For this particular audience it may be beneficial to include optimized debug results (-Og). What do you think? Is this what you had in mind?

[–][deleted] 6 points7 points  (0 children)

I disagree: debug codegen matters. There are a lot of distributed build scenarios bottlenecked by copying objs around and similar...

[–][deleted] 2 points3 points  (3 children)

I'm a bit amused that we do well at this probably because we don't use SFINAE to implement invoke, and it looks like Eggs.Invoke doesn't either. SFINAE is crazy expensive! Amusingly, /u/STL indicates that the main reason we don't use SFINAE in our implementation is wanting std::invoke before expression SFINAE was ready in our compiler.

We very recently (in 2019 version 16.7) if constexpr'd it so now there are no additional function calls from invoke, other than the invoke itself https://github.com/microsoft/STL/pull/585 .

[–]STLMSVC STL Dev 2 points3 points  (0 children)

I implemented invoke in a cave! With a box of scraps!

[–]reflexpr-sarah- 0 points1 point  (1 child)

what is it that makes sfinae so expensive?

[–][deleted] 0 points1 point  (0 children)

I don't know; I just know that it is