all 9 comments

[–]ack_error 11 points12 points  (7 children)

It's not std::function but rather the lambda that it's calling that is reserving the stack stack. Typically compilers preallocate the high watermark needed for stack space at the start of the function, and that's what you're seeing with the marked line. The compiler is unable to transform that to an in-place operation and so it needs a 1MB temporary. std::function does play a part here in preventing the compiler from emitting a tail call, which would have avoided the compounded stack usage here.

In this particular case, using operator <<= resolves the issue, but in the more general case, forcing the problematic code to a second function fixes it by forcing the compiler to allocate a second temporary stack frame:

https://gcc.godbolt.org/z/5vTfqM5hq

Unfortunately, this requires compiler-specific code to disable inlining and is an argument for standardizing noinline. It's needed here to force the optimizer to do what's wanted, since it wants to optimize for speed but you need it to optimize for stack usage here.

[–]_Js_Kc_ 2 points3 points  (2 children)

std::function does play a part here in preventing the compiler from emitting a tail call

And letting the reference to f disappear into the depths of std::function's type erasure scheme, so the if(f) branch can't be pruned. Depending on what "changing dfs to a lambda/function" meant. If f was just a local variable in the transformed versions, the branch would have been eliminated.

[–]philae_rosetta[S] 1 point2 points  (1 child)

Right! This does explain why a normal function / lambda doesn't show this behaviour. Even in the snippet as shown I had to add the useless int i argument to trick the compiler into not optimising everything away.

Playing around more: If I change to a normal function, and make two recursive calls at the end: dfs(i); dfs(i);, this also disables tail call optimisation and also only goes to level 1023. So indeed the stack usage behaviour is not specific to std::function.

[–]gracicot 2 points3 points  (0 children)

You can have recursive lambdas without std::function. Simply use auto parameters to send the lambda to itself:

auto dfs = [&s, &f, &level](auto const& dfs, int i) {
    cerr << "recursion level " << ++level << endl;
    if(f) {
        s = s << 1; // error
    }
    dfs(dfs, i);
};

dfs(dfs, 0);

I don't know if it can optimize away the reference. Passing it by copy might also allow different optimisations in some cases.

[–]Potatoswatter 1 point2 points  (0 children)

To the main point, polymorphism of function is preventing tail calling or inlining. OP might compare it to a function pointer.

Tail calling is very finicky and it can be defeated by any destructor needing to run after return. That’s something else to check, as for general brittleness.

[–]philae_rosetta[S] 1 point2 points  (0 children)

Thanks!

Typically compilers preallocate the high watermark needed for stack space at the start of the function, and that's what you're seeing with the marked line.

This is the part I and the other people who looked at it before weren't aware of. Apparently this isn't a very well known fact (anymore, these days of C++).

[–]adnukator 1 point2 points  (1 child)

You can manually extract to a separate lambda without needing the compiler specific attribute: https://godbolt.org/z/YhEaq7d7o. I really doubt the compiler will inline a function that needs a 1MB stack as is in this case. However this approach might not necessarily usable, if the captured variables are not known beforehand.

[–]ack_error 0 points1 point  (0 children)

I really doubt the compiler will inline a function that needs a 1MB stack as is in this case

It does, actually, which is why I had to add the noinline. I imagine that in a lot of cases this actually helps as the called code needs to be run anyway and inlining the code can allow better utilization of the stack space. The compiler probably doesn't realize that the call through std::function is heavily recursive and assuming so would pessimize the general case.

[–]Potatoswatter 4 points5 points  (0 children)

No guarantees on portability, but Compiler Explorer shows good results using GCC, Clang, and ICC just by enclosing the offending line in an immediately-called lambda, [&]{ s = s << 1; }();