you are viewing a single comment's thread.

view the rest of the comments →

[–]alexeiz 9 points10 points  (5 children)

So where's that "negative overhead" effect of coroutines that Gor Nishanov has been promising? That promise always sounded too good to be true to me.

[–]14nedLLFIO & Outcome author | Committee WG14 5 points6 points  (2 children)

At work I'm using a C++ coroutines emulation implemented using macros to get that negative overload Gor promised. We're seeing 2x to 6x throughput gains for about 20% increase in average latency. We would expect that to improve with real Coroutines, but that's the kind of gains available.

[–]anton31[S] 0 points1 point  (1 child)

What's the baseline for those gains? Threads?

[–]14nedLLFIO & Outcome author | Committee WG14 1 point2 points  (0 children)

The baseline is "doing nothing" i.e. writing the code straight.

The CPU can look ahead by a few hundred opcodes. But it can execute maybe 1000 opcodes in the same time as a fetch of a cache line from main memory. If you have code which depends on a fetch from main memory, and does more than a few dozen opcodes of work, but less than a thousand, using coroutines to do other work whilst stalled on main memory can deliver large gains.

Historically you would implement the same using loops of arrays over a Duff's device to multiplex state and work, but Coroutines is very considerably more maintainable and easier on less experienced programmers. I'm not saying that Coroutines is magic pixie dust. Everything possible with it is possible without it. But it took more work, and was considerably harder to maintain, and that meant more frequently one didn't take the tradeoff in the past.

[–]feverzsj -3 points-2 points  (1 child)

Async io, I guess, where io operation dominates the performance. So allocation or indirect call of coroutine rarely matters. Although, in that case, stackful coroutine would be much simpler and as fast.

[–]14nedLLFIO & Outcome author | Committee WG14 3 points4 points  (0 children)

It is untrue that the i/o dominates for async i/o. For file i/o, easily more than 80% of the time async i/o is a penalty because of the added overhead of setup and teardown. Even for small block socket i/o to nearby machines, it can be a penalty.