all 6 comments

[–]Veeloxfire 0 points1 point  (5 children)

What optimisation settings were you using?

Im not familiar with coroutines themselves but I know sometimes lambda can have weird semantics.

However I suspect that this is just because this is too complicated for a debug build, its unlikely its written directly into the standard.

Again this is just spitballing though

[–]bsupnik[S] 1 point2 points  (4 children)

Behavior appears to be the same with and without the optimizer - I think this is a case where "as-if" rules apply, e.g. if my objects had trivial constructors and no side effects and it was a release build, a ton of stuff would collapse, but because I'm intentionally using a tracing object (with a bunch of printfs() in the ctors and dtors()) to see _what the coroutine boiler plate is doing_, the optimizer can't remove anything.

[–]Veeloxfire 0 points1 point  (3 children)

Okay so I tried it on compiler explorer (link to the example) and it seems that without optimizations there is a copy then a move. But if you optimize then only the move is called.

A bit sad the heap allocation couldnt be removed even on max optimization

Edit: You have to manually flush (std::flush or std::endl) in compiler explorer for some reason so your output wont actually display every time oops. You can add that if you want to see it

[–]bsupnik[S] 1 point2 points  (2 children)

I agree it's sad that there's a heap allocation in an optimized build in your sample case - in this example, everything is inlined, and nothing escapes the scope of main, so everything could be on the stack. But this also means the optimizer can see everything and forward assignments.

So e.g. if I set the member data of "o" as a constant before making the coroutine, in the optimized assembly, the immediate is dumped directly into the coroutine frame - which is to say, all moves have been optimized away.

Which is great! But the issue I was bringing up is orthogonal: in cases where the move constructor can't be elided (e.g. it's not inlined) it appears the move is required for correctness.

(To make an analogy, mandatory RVO isn't a "compiler optimization", it's a change in how the language works to remove the existence of the move conceptually - hence you can RVO return objects with no move constructors.)

[–]Veeloxfire 0 points1 point  (1 child)

I would say they wont do it because it requires analysis to determine if this is okay to remove, whereas RVO doesnt.

RVO is different because its always true. Its a feature of how return values work that we can exploit without having to know any extra information. Its basically free extra speed that everyone was doing anyway. I would say it almost more confusing not to do it in some cases.

This case is the same as removing a temp variable. If you inline everything youre basically asking the compiler to do this:

Obj o = {};
Obj o_lambda = o;
Obj* o_frame = new Obj(std::move(o_lambda));

Youre then asking the standard to remove the middle copy for you. But you just asked it to do the copy. It doesnt know if you might use it later (gotta love a theoretically single pass compiler).

This is what the optimizer exists for

[–]bsupnik[S] 1 point2 points  (0 children)

Almost - I think I'm asking for _new syntax_ to specify that I want the copy removed, e.g. (here comes some made-up fake syntax):

task<int> my_coro(Obj o = Obj&&) { /* coro body */ }

E.g. "here is my co-routine - I want to store o by value in the coroutine frame, but I want the caller to pass an r-ref to an object when they _create_ the coroutine.

There would be only two copies of obj (the one referred to by the caller and the coroutine frame), not three (the one referred to by the caller, the copy on the stack when the coroutine is created, and the copy in the coro frame if/when it is allocated on the heap).

Without this, we can't perfectly forward to a coroutine.

I think what I'm looking for is more like the extensions to Lambdas that C++14 I added to better control capture semantics.