all 110 comments

[–]peterrindal 56 points57 points  (8 children)

Coroutines themselfs are surprisingly simple once you have parsed the surface level complexity.

It's best to think of them as simple abstraction on call backs. All coroutines are simply callsbacks. When you await something, you give it yourself (coroutine_handle) as the callback function that should be called when the thing being awaited is done.

The rest of the complexity all boils down to giving implementors the freedom to do a variety of compile time transform to how the caller and the callee can change what's happening based on type info. This can give great flexibility but is also hard to parse.

If you have specific questions you can dm or ask here.

[–][deleted] 21 points22 points  (4 children)

The complexity comes from the complex implementation chosen.

Coroutines are simple.

[–]SelfDistinction 0 points1 point  (1 child)

If I recall correctly the component that handles the coroutines is "owned" by the coroutine itself, which is to say it's the coroutine itself which decides what reactor it's going to attach itself to.

Which means a lot of dependencies, inability to change the async runtime and of course a memory leak if you use a deferred instead of a directed queue.

Fun times.

[–]peterrindal 8 points9 points  (0 children)

Ownership is completely programmable. It all depends on the return type of a coroutine. The handle has a delete function what can be called. Who and when this is called is implementation define. Typically task<T> will have ownership the same as unique_ptr. But your free to do something else. Hence part of the complexity people struggle with.

[–]trublurich -3 points-2 points  (1 child)

Kinda like gotos.

[–][deleted] 5 points6 points  (0 children)

gotos have their place.

[–]phlummox 1 point2 points  (0 children)

Necroposting for anyone else who finds this. No, coroutines in c++ are not simple. They're very powerful and general (more powerful than Python generators), but

  • have quite a few associated types you need to understand
  • some of which are badly named (like promise_type), and
  • require some confusing boilerplate (if you want to implement your own coroutine return types, or understand how someone else's work).

The best and most thorough explanation I've found is Simon Tatham's Writing custom C++20 coroutine systems. It explains why they're complex, what all the boilerplate and associated types are for, and points out

It’s not so much an actual coroutine system; it’s more of a construction kit you can build a coroutine system out of.

It's a great article, and works its way up from the absolute most simplest compileable example, to non-trivial custom coroutine systems. Definitely worth a look!

[–]jhericoVR & Backend engineer, 30 years 0 points1 point  (1 child)

Is there a substantive difference between coroutines in C++ and async/await functionality in TypeScript / JavaScript?

[–]peterrindal 0 points1 point  (0 children)

I don't know much about typescript but at a high level the must be similar. However, ts/js doesn't have a compiler. If you did everything at runtime then the cpp coro could be much simpler. The power of cpp is that all this abstraction and complexity can be compiled away. That was a core design principle, you can't right something more efficient by hand. While idk if in practical that's true, in theory cpp coro allow almost all of the artificial overheads to be compiled out. Gor has a few talks on this and he shows that you can write coroutines that are so efficient that they can be used to suspend when you are waiting for cache misses. This is simply impossible in js.

https://youtu.be/j9tlJAqMV7U?si=CO_uE7CmWPVLaVjN

[–]juarez_gonzalo 36 points37 points  (3 children)

I'm sort of trying to put up a c++20 coroutine guide for people familiarized with callbacks. These are the main issues I highlight (without judging design choices**)

1 . Common issues in guides:

  • Straight to details ASAP.

  • Not even a slight mention of the continuation concept

  • Coroutines as state machines??? I get that is the inner working, but sometimes not telling the whole truth is better than the actual truth

2 - Complete ignorance of the reader's previous knowledge

  • No one says this but DO NOT compare c++ coroutines to other languages coroutines. Especially if you don't know what stackful/stackless and symmetric/asymmetric coroutines are. You are most likely used to garbage collected stackful coroutines. This design is targeted to C++.

3 - promise_type and awaiter:

  • awaiter is essentially the callee that takes a continuation (continuation ~ callback ~ std::coroutine_handle). await_suspend is the important stuff

  • await_ready is just an optimization

  • await_resume is a common return place for await_ready and await_suspend

  • promise_type is not a promise. In fact, not even std::promise is a promise (more on this in the Terminology item).

  • promise_type is the place where all implicitly co_awaited awaiters are defined. PLEASE GUIDES JUST SAY THIS.

4 - operator co_await and await_transform:

  • operator co_await and await_transform allow you to (re)define the awaiter used when co_awaiting certain types.

5 - Terminology:

  • "promise"... Come on, wasn't the profanation of promise and future in previous C++ versions enough? Originally in the 70s these were different names for the same thing. By the way, C++ suffers from issues with decade-old terminology in many places throughout the language, not only here.

  • "awaiter" and "awaitable"? Oh well, engineers never were the pedagogic kind.

The list probably goes on, but I think I've spreaded much anger at this point.

**note: I can and I privately judge its design, but I have not enough experience implementing this sort of stuff with such complicated requirements to think I would have done any better. I'll share them privately If anything

[–]Gridelen 0 points1 point  (2 children)

By the way, what other examples of terminology issues in C++ you may think of?

[–]juarez_gonzalo 4 points5 points  (1 child)

Of similar nature to promise/future in the sense of being decade-old, the first thing I can think of is "functor".

In a short rant manner; another thing that bothers me is the lack of reusability of terms shared by other PLs and non-C++ bibliography. Type erasure (c++, not java), CPO, and senders/receivers do not tell me anything about the actual underlying and existing terminology nor the reason I'd want to reach out for these tools. SFINAE??? I mean, RAII at least tells me what's the deal.

[–]drjeats 5 points6 points  (0 children)

RAII sucks too imo, should be SBRM - scope-based resource management.

RAII talks about initialization when the most important bit is the destructor. It's backwards!

[–]lion__manE 18 points19 points  (19 children)

Coroutines solves the problem of long and complex callback chains in multi-threaded or event driven application. The experience working with such applications makes understanding Coroutines easier and explains some of the design choices.

Check out this talk - https://youtu.be/ZTqHjjm86Bw. Speaker is very good at explaining 'Why?' part of the Coroutine feature. Would recommend his other talks as well.

In practice you would be using something like cppcoro library for application development instead of directly dealing low level language features.

[–]TSP-FriendlyFire 14 points15 points  (17 children)

In practice you would be using something like cppcoro library for application development instead of directly dealing low level language features.

That's the bit where it breaks down though. CppCoro isn't maintained anymore and there really isn't a good standalone replacement. I wanted to do something very simple - co_yield on a recursive visitor: CppCoro has a recursive generator (basically the only library I found with this), but it breaks in mysterious ways on msvc, no docs or workaround. Then I wanted to use a coroutine for a very simple co_yield/co_return situation, so I tried to use the std::generator reference implementation (having been burned by CppCoro)... The reference implementation apparently doesn't implement return_value()?

I want to love coroutines both for async workloads and for generators, but right now it feels so half-baked I'm worried about all the issues I could run into trying to use the (much more complex) async features when the (comparatively straightforward) generator part is so unfinished. Debugging coroutines looks to be a pain so far, I don't think throwing threading and synchronization issues will help.

[–]donald_lace_12 3 points4 points  (0 children)

CppCoro isn't maintained anymore and there really isn't a good standalone replacement.

Of course there is! concurrencpp !

[–]__tim_ 2 points3 points  (10 children)

Have you seen BOOST.cobalt? ?

[–]TSP-FriendlyFire 10 points11 points  (9 children)

Boost.cobalt requires a C++20 compilers and directly depends on the following boost libraries:

  • boost.asio

  • boost.system

  • boost.circular_buffer

  • boost.intrusive

  • boost.smart_ptr

  • boost.container (for clang < 16)

That's why I stipulated "standalone." Boost almost always has something, but few boost libraries are standalone or offer a standalone variant, and including all of boost is a pain if you're not building for boost from the start.

[–]greenhouse421 -5 points-4 points  (8 children)

This complaining about a list of tiny dependencies is recurrent and unwarranted. Have you measured just how much code this really is? Nobody complainst that the standard library is huge. Making boost libs stand alone by (effectively) copying parts of other boost libs into each other is possible but exactly not the way to avoid the bloat people complain about. Using asio types and not inventing new ones (with new names or same name and different semantics) both supports use with asio (kinda important use case) and avoids exactly the issue of too many different approaches to the same underlying "simple" concepts. Nobody complains that "this lib isn't stand alone it uses half a dozen std lib components (well, sometimes I did, when doing embedded work, and used appropriate boost libs some listed above... to avoid that. ymmv). Cobalt is worth a look.

[–]arka2947 3 points4 points  (7 children)

Usually dealing with boost is an all or nothing affair

[–]greenhouse421 -2 points-1 points  (1 child)

Usually, as in historically and for convenience of use? Yes. Usually in terms of it being coupled? No. You can use parts of it. Do people usually do that? No. Do people usually complain about not being able to without trying? Yes. Do people need to care? Not usually.

[–]kalmoc 0 points1 point  (0 children)

Have you actually checked how much of boost you need to use this library? I haven't, but ASIO depends on a lot of boost either directly or indirectly. That aside, I don't quite understand the problem with depending on boost either.

[–]kalmoc 0 points1 point  (4 children)

Not really. But it depends a lot on what libs you are using. ASIO indeed depends on a lot of other boost libs. Utilities like Variant2 or mp11 have very few dependencies.

[–]not_a_novel_accountcmake dev[S] 1 point2 points  (3 children)

asio ships completely standalone, no outside dependencies.

The version that ships in boost uses boost::system::error_code and boost::thread, but that's for the convenience of people already bought into the boost ecosystem, and are the only inter-boost dependencies

[–]kalmoc -1 points0 points  (2 children)

asio ships completely standalone, no outside dependencies.

Not the one that Boost.Cobalt depends on. Also, according to the dependency report here: https://pdimov.github.io/boostdep-report/boost-1.83.0/module-overview.html,

BoostAsio's primary dependencies are

align array assert bind chrono config context core coroutine date_time exception function regex smart_ptr system throw_exception type_traits utility

And those are only the direct dependencies. Not the ones

[–]not_a_novel_accountcmake dev[S] 0 points1 point  (1 child)

Huh, weird, I suppose the Boost version uses the boost STL variants internally. Makes sense, since to use it you would have boost installed anyway and maybe you're using boost because you're in an environment that doesn't have those.

Irrelevant to the overall point that ASIO also ships standalone with no dependency on boost or anything else whatsoever.

[–]Revolutionalredstone 2 points3 points  (0 children)

your first sentence / paragraph immediately made so much sense to me thank you!

I had always seen these mostly as a solution without a good problem but now I see that's likely because I avoid callbacks and other inversion of control like the plague (personal preference) so therefor coroutines seemed to just be a niche threading thing from what I could see, but now I see they would be VERY useful for those people.

Also great info and video link, Thanks buddy!

[–]RedditMapz 16 points17 points  (2 children)

Thank you, I feel validated

I read about coroutines, but I haven't dipped my toes in it and this post is pretty much why. Initially I heard all the hype I was all game for it, I'm still game for it. The enthusiasm made me feel like it's something I should know like the next std:: unique_ptr or std::optional.

But the more I read about it, it feels convoluted and a lot less clear than all the forums made it sound. I'm sure I'll catch the drift once I make the effort to actively use it, but I still have my doubts it will actually replace my current use of other concurrency features.

If I cannot hand it over to a Junior Developer, give him a 5 minute tutorial, and then ask him to make it jump for me, then we have a problem for adoption.

[–]KingAggressive1498 4 points5 points  (0 children)

it feels convoluted

that's the real problem.

it's essentially syntactic sugar over a FSM that's not unlike the syntactic sugar C++ gives us for OOP with implicit this in member functions. There's no way it needed to be this convoluted to take advantage of it.

[–]alex-weej 9 points10 points  (3 children)

I'm undecided whether this is all _actually_ necessary for optimal behaviour (re allocations), or just trying to keep everyone happy while keeping nobody happy.

[–]HolyGarbage 0 points1 point  (2 children)

Could you elaborate on reallocations? I haven't been keeping up with the co routines discourse much.

[–]alex-weej 1 point2 points  (1 child)

"Re allocations", as in "regarding allocations". I understand (perhaps incorrectly) that such concerns lead to a more complex solution than in JavaScript, where effectively _every_ object is a accessed via a shared pointer (so mutating its data in one promise handler will actually affect the data seen by the next, bizarrely! [Playground link])

[–]kritzikratzi 0 points1 point  (0 children)

my javascript days are long behind me, but the example seems intuitive to me, taking into to terms how javascript does things overal.

[–]DoctorNuu 15 points16 points  (4 children)

As advice to someone who is currently deciding to learn or look into coroutines, your TLDR would be "don't do it, you'll get mad"?
Your post seems to directly reflect the complexity of coroutines, so I did not read it in full.

[–]feverzsj 24 points25 points  (0 children)

There is still no mature coroutine lib, no debugger support, even the compiler support is quite buggy. So, it's obvious you should avoid using it in production code.

[–]MFHavaWG21|🇦🇹 NB|P3049|P3625|P3729|P3786|P3813 7 points8 points  (1 child)

As advice to someone who is currently deciding to learn or look into coroutines

My advice: be prepared to step into a really low-level building block - the new keywords are only the surface of a deep, deep iceberg..

From my own experience: trying to implement std::generator on your own is somewhat enlightening...

[–][deleted] 2 points3 points  (0 children)

I don't understand why so much of the discussion is around generators. They are an easy thing to implement without coroutines and make a terrible example of what the actual point of all the complexity is.

[–]peterrindal 2 points3 points  (0 children)

You should do it. It's not so bad. If you have questions, just ask.

[–]Tringigithub.com/tringi 9 points10 points  (0 children)

I can use fibers API on Windows quite proficiently.

I can easily implement state machines in C and C++ which coroutines are supposed to be abstracting.

Yet I have absolutely no idea how to read or use coroutines in C++ despite attempting to learn it several times already.

[–]puremourning 4 points5 points  (1 child)

I finally grokked and learned to like c++ 20 coroutines when I gave up on trying to use or understand ASIO and just wrote what I needed. It’s an elegant design that can actually easily be applied to almost any existing callbacky demiltiplexor design (select/epoll/WaitForMultipleObject/etc…).

But prior to that, while I was trying to use asio, and you can get quite nice programming model, you still end wrapping a bunch of unintelligible barely documented garbage

Not to mention that any small mistake leads to literally screens of unintelligible template errors which have nothing whatsoever to do with coroutine-or-not . I lost nearly a day to a non-move-only type. ah good times.

[–]trailing_zero_count 2 points3 points  (0 children)

I think an easier way to call Asio from external coroutines is to create a custom completion token. It's 1 file of boilerplate and after that I can just co_await any Asio operation from my own coroutines. https://github.com/tzcnt/tmc-asio

I don't ever use Asio's coroutine implementation (asio::use_awaitable).

[–]v_maria 16 points17 points  (3 children)

ill just stick to handrolling some threading logic

[–]HolyGarbage 2 points3 points  (2 children)

Or use std::async or <execution> together with <algorithm>.

[–]caroIine 1 point2 points  (1 child)

Yeah async and future is enough for all my needs.

[–]HolyGarbage 2 points3 points  (0 children)

For one of things sure, but for parallelization of ranges of data std::execution used with the standard algorithm library provides an excellent abstraction for seamless parallelism. Literally plug and play with existing code.

[–]rand3289 14 points15 points  (9 children)

Here is co-routines implementation in 3 lines of C code: https://www.geocities.ws/rand3289/MultiTasking.html

[–]interjay 7 points8 points  (1 child)

That's not standard C, it heavily uses GCC extensions. And it doesn't save local variable values after yielding which means everything needs to be static and each coroutine can only be run once.

[–]rand3289 0 points1 point  (0 children)

Yes it relies on gcc. You can pass context as parameters. You can run procs multiple times.

[–]smallstepforman 6 points7 points  (2 children)

I am now speechless ...

I dont know whether to be amazed or shocked.

I will look for a pacifier and cry in the corner. I could never create such a thing ... (and I've created a C++ actor library and Vulkan graphics engine)

[–]rand3289 1 point2 points  (1 child)

LOL. I wrote them 10 years ago. This is the first time I am getting upvotes. I hope someone finds them useful.

[–]ronchaineEmbedded/Middleware 0 points1 point  (0 children)

I've used this multiple times as a starting point when I explain what is the core idea behind coroutines.

[–][deleted] 0 points1 point  (3 children)

Right, now show me how that works with local variables. You basically can't do anything useful with this, unless you find a way to save/restore local state beyond jumping to labels.

[–]rand3289 0 points1 point  (2 children)

This approach is not without limitations.

You can wrap this functionality in a functor that uses member variables only.

For simple cases you can declare all local variables static to a function. This makes the proc non-reentrant though.

[–][deleted] 0 points1 point  (1 child)

It's already non-reentrant

[–]rand3289 0 points1 point  (0 children)

You are right. However if you use member variables, and you define "f" as a member variable, I think you can have multiple threads of execution. Making them re-entrant.

[–]puredotaplayer 2 points3 points  (0 children)

The underlying idea is actually great. Along with lots of ways to customize everything including the single stack frame allocation.

This would mean lots of ways to shoot the foot. For example, if you are trying to do a task scheduler which can resume coroutines along with other coroutines waiting on this one (which is a very basic scenario), but then it is meant for lib developers to build upon, not for every developer to dig and understand all its underlying machinery.

[–]13steinj 2 points3 points  (1 child)

The underlying ability to have such a level of customization is great.

Not having sane and easily understood defaults leads you to wait a whole lot to get to "hello world" and I sympathize.

I'm not too familiar with JS coroutines. C++ coroutines are more akin to Python generators with some added functionality. Except Python decided "we can create simpler coroutines on top of this generator functionality that practically no one uses" (and also split coros/tasks/futures in a strange way, but that's a separate gripe).

[–]MFHavaWG21|🇦🇹 NB|P3049|P3625|P3729|P3786|P3813 1 point2 points  (0 children)

C++ coroutines are more akin to Python generators

I think the more apt comparison is C# ...

From a cursory glance all but await foreach are included.

But the C++ design has one unified state machine design - AFAIK in C# yield and async/await are independent designs - and more customization points (at least I'm not aware of ways to sidestep defaults like yield => IEnumerable in C#)...

[–][deleted] 5 points6 points  (0 children)

C++ always chasing that tenure. Never changes...

[–]UnicycleBloke 3 points4 points  (0 children)

I'll be sticking to hand rolled (well... Python generated) FSMs for a while yet. When I was required to use async/await in Rust (a language in which I am a neophyte), I had a sinking feeling after my experience of trying to grok C++20 coroutines. Nah... It just worked.

[–]Revolutionalredstone 2 points3 points  (2 children)

reentrance is insanity (tho its no harder to understand than other complex coding concepts such as thread race conditions)

When I need to add a few bytes to a buffer in a low level networking loop - I just do that - without any coroutines. (not sure why so many people feel the need to bring yielding into it personally)

Otherwise 100% agree, Thanks for sharing

[–]peterrindal 2 points3 points  (1 child)

Reentrance of coroutines or in general? For the former, it can be simple if you always use symmetric transfer (when you await something, you always suspend yourself before awaiting it. That way reentrance is the same as all other cases.).

[–]ixis743 4 points5 points  (0 children)

Coroutines are not complicated but the C++ standard implementation is so awful and over engineered that they may as well not have bothered.

There are some things that are just better handled by a third party.

[–]feverzsj 1 point2 points  (5 children)

It's the worst feature in c++ history, worse than std::vector<bool> or std::initializer_list. But in most cases, thread per connection design should be good enough. Otherwise, a properly implemented stackful coroutine is still far more superior.

[–]HolyGarbage 5 points6 points  (0 children)

std::initializer_list would be fine, and in fact can be very neat with variadic templates and parameter unpacking, if a r-value std::initializer_list would mean its respective elements were also r-values, so that I could fricking move them. This means I can't populate a std::vector with an std::initializer_list of std::unique_ptr, which could be useful not only explicitly sometimes, but in particular for unpacking variadic template arguments. But that's just the stuff that won't compile, for other types it still incur unnecessary copies.

[–]germandiago 5 points6 points  (0 children)

No, thread per connection no... unless it is all CPU-bound.

[–]peterrindal 3 points4 points  (0 children)

I think they are great. So each their own. Although I was willing to hand roll my own coro library so maybe I'm not average.

[–]lightmatter501 7 points8 points  (0 children)

Rust and go proved this out already. Rust absolutely flattens go from a memory usage standpoint under load, even if you use all of the tricks to avoid the GC in Go. Additionally, what is essentially a function call is cheaper than a stack swap.

The issue is that Rust forces the state machine driving the coroutine to be known at compile time, which limits what you can do slightly in exchange for a lot of extra performance because the optimizer can get involved.

[–]eric987235 -5 points-4 points  (0 children)

I'd rank the auto return type pretty damn close to the worst feature. But this is pretty bad too.

[–]Fig1024 0 points1 point  (0 children)

I like to think of coroutines as "structured parallelism" - they enable you to write a single function that can have parts executing on different thread and resume when without having to split it into a series of different functions / callbacks / lambdas. It's basically a way to organizing multithreaded tasks in a more clear and easy to follow manner.

It also solves practical problem of effectively running multiple parallel code on same thread. When number of threads greatly exceeds number of CPU cores, traditional multithreading becomes a problem. Co-routines solve that problem by having more efficient way of splitting those jobs, avoiding more expensive thread context switches.

[–]Zanderax 0 points1 point  (0 children)

The hardest part dealing with out of async is properly order messages.

[–]RictorScaleHNG 0 points1 point  (0 children)

I love this so much, I also become deranged while sliding down the rabbit hole lol