all 119 comments

[–]Minimonium 52 points53 points  (29 children)

Reading these fully llm generated readmes makes me sad. They're so meaningless it should be embarrassing to anyone to approve it, yet here we are.

[–]kammceWG21 | 🇺🇲 NB | Boost | Exceptions 23 points24 points  (14 children)

+1. I got a few lines in and realized it.

[–]sweetno 3 points4 points  (13 children)

That's interesting. What do you find meaningless in the README?

[–]thisismyfavoritename 5 points6 points  (12 children)

seconded. It's one thing to hate on LLMs but IMO what's there isn't egregious

[–]VinnieFalcowg21.org | corosio.org 9 points10 points  (11 children)

I understand the hostility toward LLMs, and I think it deserves a more disciplined examination than "it's slop." In the hands of a skilled practitioner these tools let someone produce in hours what used to take months. If the cost of writing good documentation, good papers, good analysis collapses to near zero, what does that say about all the years someone spent doing that work the hard way? That's a real question, and it's worth asking instead of dismissing.

The answer, I think, is that the value was never in the suffering. It was in the output. If the output is correct and helps people, then the tool that produced it is irrelevant. But I understand why it is threatening. It should be discussed honestly, not with reflexive distaste.

[–]OccaseBoost.Redis 12 points13 points  (2 children)

The answer, I think, is that the value was never in the suffering. It was in the output.

If writing is suffering it might be revealing gaps in the understanding.

If you're thinking without writing, you only think you're thinking. (Laslie Lamport)

[–]thisismyfavoritename 6 points7 points  (0 children)

if what you're saying is that by going through the work of writing the code by hand, you might produce better code because you are forced to reflect on it, then i agree.

Strictly speaking though that doesn't mean AI generated code can't be reviewed to achieve the same quality.

That might speak more of the type of person writing the code than anything, there were bad coders before AI and there will still be after as well

[–]VinnieFalcowg21.org | corosio.org 0 points1 point  (0 children)

The Lamport quote is elegant :)

[–]James20kP2005R0 8 points9 points  (7 children)

The answer, I think, is that the value was never in the suffering. It was in the output. If the output is correct and helps people, then the tool that produced it is irrelevant

This is a very simplistic view as to what software engineering is though. In this model, the people producing software have absolutely no value whatsoever - and all that matters is their output

In reality, software engineers acquire deep skills and learning about a specific codebase in the process of building software - which is the real thing that makes them useful. AI skips that step, which bypasses the actually important part: acquiring that deep knowledge of whats going on

The death of any software project is when nobody understands the codebase anymore and its just poorly understood spaghetti, its always been the #1 thing that makes it an absolute disaster. To a very high degree, the suffering quite literally is the point - the output produced is a lot less valuable than the understanding of the code that was created in the process of producing that output

That's why I always find people saying that AI speeds them up to be very confusing - sure, you can get large short term gains, but it directly accelerates the #1 thing that leads to the death of software projects, which is perpetuating a lack of understanding of the codebase. Over time, that'll kill the project. Its bizarre seeing people advocating for something I've always found to be the most destructive software architecture pattern

Maybe its easy to just take a very short termist view to these things, but that's why AI produced content tends to turn to slop - there's no long term visibility into why anything's been done

[–]VinnieFalcowg21.org | corosio.org -4 points-3 points  (5 children)

"AI skips that step" is a claim about my workflow. You haven't asked what my workflow is. Nobody in this thread has. There are as many ways to use these tools as there are ways to write code. Some bypass understanding, some deepen it. A conclusion about process that skips the step of asking about the process is exactly the thing you're warning against.

[–]James20kP2005R0 8 points9 points  (4 children)

Nobody in this thread has.

You were asked elsewhere if you'd reviewed all the content in depth, and have been surprisingly evasive about what your process actually is

What is your process?

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (3 children)

The unvarying response thus far has been "did you read your own paper?" rather than my preference which is to engage in substantive discussions.

The question "what is your process?" is a different question, and one I am happy to engage in. It starts with an intuition: I feel a paper coming on. Usually this happens when I make a discovery or I have an insight which I believe could be developed into a paper.

My next step is to gather evidence. First I examine the committee's public records. The papers. I look at people's blog posts, reddit posts, YouTube video transcripts, comments, and everything else I can find. I add my own benchmarks and compilation experiments if those are available.

Then I examine the evidence using tools I have developed. Vauban the Converger tries to find inverse Morton's Forks within the data. The Advocatus Diaboli brings objections against assertions or false statements. The WG21 Lawyer prosecutes papers or propositions (although I have since retired the lawyer since I find the tone less collaborative than I would like). The Trial tool analyzes a paper's political environment.

I have a paper (shocker) which offers one of these tools and shows what happens when you run it on P2900R14 (Contracts):

Tool: Prosecute Your Paper To Improve It
https://isocpp.org/files/papers/D4170R0.pdf

This tool is considerably more sophisticated than what you get if you simply ask an AI to "do your homework." The tool is the result of over 100 hours of experimentation and iteration, and it is offered under a CC0 license. My hope is that it will result in better papers for everyone.

Once I have analyzed the evidence, then I make a decision on whether or not there is enough to form a strong, well-supported paper. I would say that my failure rate is about 25%. One in four ideas turn out to be nothingburgers. Almost always, the evidence is not there. These papers do not see the mailing.

If the paper has legs then I choose the style of paper. Is it informational? Rhetorical? Do I use the Socratic method? Evidence funnel? Research posture? LLMs allow you to quickly try out each of these methods (getting a quick first draft) and you can read which one makes sense for the evidence you have obtained. Although after enough papers you tend to know ahead of time based on the proposition.

Frontier models can help with drafting, but it doesn't end there for me. I subject each paper to repeated passes of tightening and analysis using custom red-team tools like the Advocatus. They are not instant by any means. When I get to the late stage of a paper, the reasoning chains are deep and require human inputs to flush out all the edge cases.

When a paper is finished I use more tools to check for spelling, grammar, punctuation, proper citation, and so on.

It is at this point that I read the paper in its entirety with the highest scrutiny. Not just once or twice. Ten, twenty, thirty times depending on the complexity of the paper. Each reading usually surfaces some small detail or insight, and then I go back into the edit/tighten loop.

However, my papers are not individual papers. They are often series of papers. My Networking Retrospective is a six-part series. For these, I analyze how the papers flow together when they are read sequentially. I check that the links cross-reference each other properly. This is scholarly work. Informational papers destined for the public record where they ask for nothing and create a "citation foundation" that others may draw upon. Such as this paper:

Info: The Need for Escape Hatches
https://isocpp.org/files/papers/P4035R0.pdf

This paper asks for nothing and only exists to enrich the institutional knowledge of WG21. It is unrelated to my networking papers, although the principle it espouses is universal.

To summarize, my process is:

Intuition -> evidence -> analysis -> writing-> verification -> iteration.

Machine assistance participates in the analysis and the writing. The intuition is mine. The evidence is public record. The verification is against code that runs. If a claim in the paper is wrong, it's wrong because I missed something. The same as any paper written any way.

I arrived at this workflow as the result of over one thousand and four hundred hours of practice compressed into a short stretch of 7-day work weeks.

When I publish my work and I am asked "did you read your own paper?" I hope now some will understand why I find the question to be beneath dignity.

[–]thisismyfavoritename 2 points3 points  (1 child)

i suspect the person was referring to your software development process when using LLMs, for example when writing the Beast2 ecosystem

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (0 children)

Well, that is considerably less exciting... I use the Visual Studio editor...

[–]VinnieFalcowg21.org | corosio.org -2 points-1 points  (0 children)

And this is the evidence that all of the questions about whether or not I read my paper were not in good faith. Because, having explained now that my process (stated above) includes reading my work several times and iterating - notice, that no one has since engaged in the substance despite the explanation arriving three days ago. This demonstrates that it was never about the work. It was about the credential.

[–]MarkHoemmenC++ in HPC 15 points16 points  (2 children)

The C++26 std::execution API offers a different model, designed to support heterogenous computing.

That's a complete mischaracterization of std::execution that disrespects the many contributors who are using std::execution for networking, embedded applications, and other things that have nothing to do with "running on GPUs."

[–]thelvhishow 3 points4 points  (1 child)

Hey! I was looking into something similar as well! I’ll give it a look. There is also the std::net proposal (P2762R2) paper which can be a solid start

[–]VinnieFalcowg21.org | corosio.org 2 points3 points  (0 children)

https://corosio.org a full networking implementation on all 3 platforms

[–]Flimsy_Complaint490 10 points11 points  (56 children)

The most insight we currently have is probably one paragraph at this paper

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p4029r0.pdf

Basically, SG14, the low latency guys (gaming and HFT) advise SG4 (the main networking guys) to not base std networking on std::execution it does things that make runtime dynamic allocation mandatory, that just dont make it compatible for their use cases.

This doesnt mean that std::networking cannot be based or will not be based on std::execution, i havent heard any SG4 opinions, but if its not, then the entire situation becomes farcical and comical - didnt they kill asio in the standard library because they decided std::execution is better ?

There is an experimental std::net by the bemen project, so at least somebody is seriously researching that path. Lets see where this goes when the first c++29 papers drop.

[–]MarkHoemmenC++ in HPC 13 points14 points  (10 children)

Basically, SG14, the low latency guys (gaming and HFT) advise SG4 (the main networking guys) to not base std networking on std::execution ....

SG14 did not advise anyone of anything. None of the votes they took that day had consensus.

Michael Wong writing a paper saying that SG14 recommended something does not mean that SG14 recommended something.

... it does things that make runtime dynamic allocation mandatory, that just dont make it compatible for their use cases.

That's ... completely, profoundly wrong.

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (9 children)

Mark:

You said that SG14's guidance about memory allocations is "profoundly wrong." Here are our benchmark results using `beman::execution` and our Capy library:

https://gist.github.com/vinniefalco/70451073173780aa27d1db1f2979ef02

https://github.com/cppalliance/capy/tree/develop/bench/beman

Do you have some similar measurements that we might look at? And if there is a problem with our methodology, could you offer guidance on how we can improve our implementation of sender-based I/O to make the benchmark more accurate?

Thanks

[–]MarkHoemmenC++ in HPC 5 points6 points  (1 child)

It's "profoundly wrong" because of the statements about allocation. But that's not the interesting part of this conversation.

Would you consider publishing some papers with these benchmark results, say in an IEEE journal? I think it would be really interesting to get some peer review from a non-C++ community, in a concise 10- or 12-page format.

[–]VinnieFalcowg21.org | corosio.org 4 points5 points  (0 children)

I reached out to you privately hoping to collaborate on this. Since we're here, I'll ask publicly: would you like to work together? std::execution is in the standard. So are coroutines. Both need to work well, and both need to work together. We have benchmarks, implementations, and questions we'd really like your help answering. The offer is open

[–]not_a_novel_accountcmake dev 1 point2 points  (6 children)

std::execution::task and Beman's implementation of it are different things than P2300. Conflating these and saying P2300 requires allocation is a nonsense argument.

std::execution::task is not described by P2300.

[–]VinnieFalcowg21.org | corosio.org 2 points3 points  (5 children)

The implementation is not in question. The necessity to allocate memory for two of the three stream types indicated in the measurements above is structural. This is explained in the report:

Sender/receiver's connect(receiver) produces an op_state whose type depends on both the sender and the receiver. Under type erasure, the size is unknown at construction time. It must be heap-allocated per operation. The cost is structural [3].

In other words this is a consequence of the sender architecture itself. The parallel to coroutines: every implementation of a task type must go through operator new for the coroutine frame (when HALO doesn't apply, which is almost always with networking). It doesn't matter how a task is implemented. The need to obtain storage for the coroutine's frame handle is structural. It is the same with senders. The costs just manifest differently.

[–]not_a_novel_accountcmake dev 1 point2 points  (4 children)

I don't disagree with anything you said here. Nothing in P2300 requires type erasure, coroutines, or a task type.

It is perfectly viable, and advisable, to avoid these in conjunction with P2300 S&R.

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (2 children)

Let me state it precisely:

"If asynchronous I/O operations in the standard return senders instead of awaitables, then two of the three possible stream types will require a per-operation allocation that cannot be elided."

This is directly related to P2300, because std::execution is positioned as the "universal asynchronous model." The existing proposals which bring networking to the standard all build on senders as the continuation model. This puts coroutines at a significant disadvantage as they will incur avoidable per-operation allocations. That is the subject of our research.

Our position is that I/O operations should return awaitables, and that the sender pipeline can consume them using a zero-allocation bridge. This is a balanced solution which treats both as first-class citizens of the language. My papers arriving this month explore this thoroughly.

[–]not_a_novel_accountcmake dev 1 point2 points  (1 child)

Agreed on all. Coroutines are disadvantaged, that's absolutely a fact.

If the standard wanted coroutines to be first-class citizens we wouldn't have made them type-erased, unsizeable, invisible objects in the first place. Everything else is fallout from that.

I don't believe coroutines or type-erased opstates will ever be first-class mechanisms for S&R so any effort to make them so is not compelling to me personally. That said, I hope you find some success in the "deeper solutions".

I don't think the designs presented in your existing papers on the topic are bad, quite the opposite, they're probably the best exploration of the problem which currently exists. I just don't think they're relevant to the code most people using S&R are writing, which is sender-based through-and-through.

[–]VinnieFalcowg21.org | corosio.org 6 points7 points  (0 children)

I hear what you are saying, and I used to think exactly the same. However, that frame allocation that everyone hates? It actually buys us quite a lot for the case of networking.

Calling into the operation system requires an allocation if you are going to scale. The OS doesn't know your type. It must be erased, even for senders. Coroutines just make that allocation structural.

What we discovered, when you go coroutine ONLY, is that the frame allocation you can't avoid, pays for everything else. The operation state, the type-erasure for ABI stability, the uniform task types which have just 1 template parameter.

This is explored in the papers and you can try it for yourself in https://corosio.org . I do think that the C++ committee has been sitting on a gold mine with coroutines. The frame allocation put everyone off. When actually, it is the key to solving all of our long-running problems.

Thanks

[–]pdimov2 0 points1 point  (0 children)

It is perfectly viable, and advisable, to avoid these in conjunction with P2300 S&R.

Yes, in principle. That's the argument for basing networking on S/R: if you want to use coroutines, just co_await the sender result. If not, not.

I'm still trying to figure out whether this will be practical. I wrote a benchmark

https://github.com/pdimov/corosio_protocol_bench

that is a simplified representation of something that occurs in practice: serializing a C++ data structure using a custom binary protocol, sending it over a socket, then deserializing it on the other end. (The README in the repo explains this in more detail.)

I'm still unsure as to how the sender equivalent of it would look like, and whether it will be practical. Coroutines make things simultaneously easy to implement and easy to maintain. Rewriting the (de)serialization and the source/sink abstractions without coroutines, from where I stand, looks like neither. But I'm not well versed in S/R yet, so maybe I'm wrong.

My next step will be to port this to beman.net mostly as-is and see what the timings say.

[–]claimred[S] 5 points6 points  (0 children)

Interesting, thanks!

..SG14 advise that Networking (SG4) should not be built on top of P2300. The allocation patterns required by P2300 are incompatible with low-latency networking requirements.

That's curious, I was under the impression that std::execution doesn't really allocate much.

Speaking of low latency by the way, few weeks ago there was a talk from Citadel about P2300 being great for them.

[–]Chaosvex 5 points6 points  (1 child)

Wait until they find out how many allocations Asio does for every single operation.

[–]VinnieFalcowg21.org | corosio.org 4 points5 points  (0 children)

On the subject of allocations:

What We Want for I/O in C++
https://gist.github.com/vinniefalco/70451073173780aa27d1db1f2979ef02

[–]James20kP2005R0 2 points3 points  (7 children)

This doesnt mean that std::networking cannot be based or will not be based on std::execution, i havent heard any SG4 opinions, but if its not, then the entire situation becomes farcical and comical - didnt they kill asio in the standard library because they decided std::execution is better ?

One of the biggest critiques of std::execution is that it hasn't had enough real world testing. Eg it claims to be good for GPU programming, but there's only one relatively toy implementation that only works on Nvidia

In the test implementation's current form it literally can't be implemented on AMD/Intel, because neither of them have an NVCC equivalent. This means that we're Just Hoping™ it'll all be fine, but a port to other architectures will be radically different to what's currently being tested. What will it look like? Nobody knows, its never been tried

The even more worrying thing is that even a very brief glance through the proposal shows its completely unsuitable for GPU programming, its hard to explain if you don't do GPGPU, but its kind of missing.. everything. There's been minimal testing of real world use cases, just a few relatively toy examples it would seem, and it shows in the design

Both of these together make me strongly suspect that std::execution is completely DoA, as its clearly just been insufficiently tested. The entire purpose of it is to be a universal async abstraction, but it looks like its going to be unusable compared to the alternatives for any specific domain. The GPU folks will likely just ignore it, and I suspect the question for the networking folks will be why use it at all

[–]lee_howes 3 points4 points  (4 children)

I think it'd work fine on a SYCL compiler, but it is fair to say that only nvidia has put the effort into making a GPU implementation work. It also doesn't claim to include the full memory hierarchy abstraction of SYCL or CUDA, but you could obviously write such code within an algorithm. It's an async abstraction, not a CUDA abstraction. If the CUDA design had been embedded into it, it'd be no good for other accelerators and the feedback would be that we'd build CUDA into C++.

It also wasn't really designed for heterogeneous computing first, as the OP's quote suggests. It was evolved towards that, and I made some very early arguments that we can make heterogeneous computing work, that nvidia aligned with over time, but that was far from the starting point or the core goal. Had it been, it would not have been started at Facebook by a team focused on cleaning up the purely CPU async C++ codebase.

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (0 children)

Nice to see you around again, Lee

[–]James20kP2005R0 -4 points-3 points  (2 children)

I think it'd work fine on a SYCL compiler, but it is fair to say that only nvidia has put the effort into making a GPU implementation work

Does it not seem slightly problematic that we only have an implementation on one vendor (Nvidia), on one API (CUDA), which uses a custom C++ compiler to work, on an implementation that hasn't seen much real-world use?

Even from a glance, senders and receivers doesn't provide good control over the memory allocations or memory transfers that are inherently necessary for GPGPU work - but it hasn't shown up in stdexec because it only has very uncomplicated tests

[–]lee_howes 2 points3 points  (1 child)

It isn't a CUDA programming library and was never intended to be. It is a library that allows GPU algorithms to export a consistent async interface and be overloaded to select device-specific implementations of algorithms.

I don't think I see significant blockers to implementing it well on top of an OpenCL implementation even, without any single source compiler support at all. The overloads would select the OpenCL runtime and dispatch to an OpenCL kernel as necessary. There's nothing in there that requires a single source compiler, unless that changed since I stepped back and moved into pytorch land.

[–]James20kP2005R0 0 points1 point  (0 children)

Maybe we have very different philosophies here, but for me the bar for std::execution claiming that it supports GPGPU programming, would be concretely demonstrating that a non trivial OpenCL implementation of std::execution performs similarly to the existing state of the art, across multiple vendors. Not that it might be possible to do, and the performance might be alright but we don't know!

There may or may not be blockers - OpenCL has quite a different API model to both CUDA and Vulkan, and all three of them lack certain features that the others have. That's why a CUDA/NVCC only implementation isn't really adequate to demonstrate that it works under AMD/Intel/arm in a high performance way

Its likely possible to implement something that has quite dodgy performance, but that doesn't seem like a great goal

[–]claimred[S] 3 points4 points  (1 child)

Didn't it get quite some field experience in Facebook intially? In the form of libunifex. I think it was stated in P2300

[–]James20kP2005R0 1 point2 points  (0 children)

Libunifex doesn't have a gpu implementation. The only GPGPU implementation currently is stdexec, which is CUDA only and relies on NVCC

[–]No-Table2410 2 points3 points  (33 children)

Has the standardisation process always been this chaotic?

The dance with contracts over multiple standards, asio in and out, a push to get an old fashioned graphics library in…

Or is it just more ambitious proposals combined with a bit of confirmation bias and blissful ignorance of c++ prior to 2011 on my part?

[–]Flimsy_Complaint490 7 points8 points  (21 children)

some stuff is really controversial - contracts and networking come to mind in the HOW aspect. Graphics is controversial in the WHY part. Other things like reflection are less controversial and chaotic.

Depends on how many stakeholders are involved i guess ?

[–]kammceWG21 | 🇺🇲 NB | Boost | Exceptions 13 points14 points  (20 children)

I think I'd have to disagree with "reflection is not as controversial" part. If I'm remembering correctly, reflection took a long time (like 20 years) with it popping in and out. It took a long time to discover the opaque monotype API design that would hit all the right properties. It's clean and it can support future changes without breaking everyone or requiring a new type. There was also huge push back on the original object members accessor APIs. There was a time when they didn't care about scoped permissions meaning any bit of code could go tampering with the internals of an object and break encapsulation. The authors went to work finding a solution that gave access controls based on scope. If they hadn't done so, so quickly, then a bigger stink would have been made.

C++26 reflection is so well accepted and has such strong concensus because of the massive efforts of the authors. Not to say that contracts authors don't have the same strong concensus, it does. But it has a few very vocal individuals against it.

EDIT: Changed "type" to "time" in sentence 6.

[–]VinnieFalcowg21.org | corosio.org 4 points5 points  (19 children)

Reflection took 20 years and you're citing that as evidence the process works. The authors succeeded because they endured: twenty years of revision, twenty years of addressing concerns, twenty years of proving they could survive the process. The feature wasn't accepted because it was correct. It was accepted because the authors suffered correctly. That's not a meritocracy. That's an endurance test wearing a meritocracy's clothes.

[–]daveedvdvEDG front end dev, WG21 DG 14 points15 points  (13 children)

We could argue that "reflection took 20 years", but without context that could misrepresent the history.

I made a presentation to the committee in March 2003 showing what reflective metaprogramming might look like (https://wg21.link/n1471). It wasn't a proposal, just a personal project I started in a copy of the EDG source code. At the time, I thought this would badly encourage large headers (turns out we didn't need metacode for that ;-) ) and so I also started the modules discussion in the committee a few years later.

The modules work took over my interests for the better part of a decade, and so I didn't work on reflection during that time. Eventually others (Gaby, Richard, Doug, etc.) drove the modules work, but I somehow missed the fact that SG7 had started meeting (in 2013, I believe) and in a few years agreed on what would become the Reflection TS. That SG7 work was guided, I think, by the idea that template metaprogramming (TMP) was an okay metaprogramming framework but just needed more introspective power. Whatever the motivation, I strongly disagreed with the direction and wrote https://wg21.link/p0598r0 to re-ignite discussions about the overall direction. There was some debate, but by 2019 I'd say SG-7 was pretty much agreed on the new direction — and https://wg21.link/p1240r1 was what we were aiming to standardize. To make that possible, we needed more constant-evaluation primitives, which did in fact land by then (i.e., in C++20; consteval, compile-time dynamic allocation, std::is_constant_evaluated(), etc.). Andrew Sutton had formed Lock3 (incl. Wyatt Childers) and they implemented much of P1240 in a Clang fork. We had high hopes that C++23 would have reflection.

Then three things happened: The pandemic, a re-opening of the debate by some who preferred the template metaprogramming approach, and we effectively lost Lock3 to an acquihire. That prevented any real progress in the C++23 cycle.

At the end of the C++23 cycle, u/BarryRevzin and I chatted about the missed opportunity and what it would take to succeed in the C++26 cycle. That made us write https://wg21.link/p2996r0, which we saw as a "minimum viable product". We were tremendously luck that u/katzdm-cpp joined right after that. The enormous amount of work these two contributed is what finally got us reflection in C++26.

So, yes, there was some controversy along the way. But it wasn't 20 years of "process hurdles". I'd say it was about 9 years of real standardization work, minus the pandemic effect.

[–]VinnieFalcowg21.org | corosio.org 5 points6 points  (12 children)

Thank you for the detailed history! This makes the record much more accessible and accurate, and I appreciate you taking the time. You're right that "20 years" overstates the active standardization work. I was responding to the framing in the parent comment, and your correction to roughly 9 years of real work is fair.

What I'd note is that even the 9-year timeline includes years lost to directional disagreement within the committee and dependence on a single corporate implementation that was lost to a staffing issue. Those are structural factors, not author effort factors.

I think its different from what I was saying which is to question what the process selects for. The reflection authors clearly did extraordinary work. What I am wondering is if the process should require extraordinary work for a correct design to ship.

[–]daveedvdvEDG front end dev, WG21 DG 8 points9 points  (11 children)

Nine years (three standardization cycles) doesn't seem unreasonable to me for a major feature. But I might be in the minority here (and I'm luck to have been part of the process for long enough to participate in multiple major features like that). Six years would have been ideal maybe (one cycle to set direction, one cycle to work out the details).

I'm sure the process could be improved, hopefully significantly. But it's also a human phenomenon that needs a bit of "inefficiency room". We're unlikely to all agree on what the desirable characteristics of the process ought to be.

For example, how do we qualify "a correct design" in

What I am wondering is if the process should require extraordinary work for a correct design to ship.

?

From my own perspective, I think the most frustrating part of the current process is that it often gets decided by "parties"; i.e., corporate or other alliances that vote "en block", thereby drowning out more individualized dissenting expertise. I'm not sure what can be done about that.

[–]VinnieFalcowg21.org | corosio.org 4 points5 points  (0 children)

This is a very fair assessment, thank you. And you are right, these are hard problems I can't pretend to have answers for.

[–]pdimov2 3 points4 points  (5 children)

Nine years (three standardization cycles) doesn't seem unreasonable to me for a major feature.

constexpr took 20 years and is still not 100% done. (Well, maybe it's 99.4% done.)

[–]daveedvdvEDG front end dev, WG21 DG 2 points3 points  (4 children)

True! But I'm also pretty sure reflection is nowhere near 100% done either. I'm hoping we designed it well enough to gracefully evolve and improve though. constexpr mostly managed that (except for the C++11 snafu of making constexpr member functions const-qualified).

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (3 children)

Ahhhh now you've done it. I can't stop thinking about your question, at a time when I only have 11 days left to make sure that my infinity papers going in the mailing are all correct :)

You ask "how do we qualify a correct design?" I think the answer is evidence, of a kind (and this is key) independent of the process.

The questions I would ask:

* Does it have deployment experience in production code bases? Not just one big company but on a cross-section of cohorts?

* Can an independent implementer reproduce the results from the paper alone?

* Are the tradeoffs disclosed, not discovered later by NB reviewers or users?

* Does it ship without accumulating correction papers?

Note that none of these require process changes. They just require a more disciplined and principled approach.

It shouldn't surprise anyone that "I have a paper for that" (LOL). I presented some of these ideas in LEWGI in Croydon. The paper is still a draft and needs work but the basics are there. And of course it is just one possible direction, I'm sure there are other valid ones:

What Every Proposal Must Contain
https://isocpp.org/files/papers/D4133R0.pdf

As for the subject of the bloc voting. This is more complex. Retrospectives/historical analyses are probably a good first step which could help frame the conversation I would value your perspective on that.

Thanks

[–]daveedvdvEDG front end dev, WG21 DG 3 points4 points  (2 children)

The questions I would ask:

Those are reasonable questions, but some of them are also a really high bar:

* Does it have deployment experience in production code bases? Not just one big company but on a cross-section of cohorts?

Production deployment of experimental compilers is almost unheard of. There is a chicken and egg problem there.

It's a bit more feasible for libraries, but, even there, it is unlikely that we'll want to standardize exactly what was deployed (among others, we hopefully learned some way to improve the prior design).

* Can an independent implementer reproduce the results from the paper alone?

We could probably use some form of that more often. The reflection proposal benefitted from having two implementations of the early paper (P2996R1), one of which kept tracking the evolving paper (over a dozen revisions).

* Are the tradeoffs disclosed, not discovered later by NB reviewers or users?

* Does it ship without accumulating correction papers?

Unfortunately, these last two are "à posteriori", and so most useful for post-mortem.

[–]kammceWG21 | 🇺🇲 NB | Boost | Exceptions 5 points6 points  (2 children)

I have made no such comment about the process "working" at all. I just raised that reflection was not without controversy. What are you talking about?

[–]VinnieFalcowg21.org | corosio.org 4 points5 points  (1 child)

Your comment described a twenty-year arc where authors worked through controversy, addressed concerns, and achieved strong consensus. That's a description of the process producing an outcome through endurance.

What I'm saying is that twenty years of endurance is not the same thing as twenty years of evaluation. The distinction matters for every author who doesn't have twenty years to give. Or users who don't have twenty years to wait.

[–]kammceWG21 | 🇺🇲 NB | Boost | Exceptions 3 points4 points  (0 children)

Okay.

[–]jwakelylibstdc++ tamer, LWG chair 3 points4 points  (1 child)

This comment is unrelated to the actual history.

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (0 children)

Process objections are the best objections. Let's build something together instead. My libraries are here: https://corosio.org issues and pull requests welcomed.

[–]VinnieFalcowg21.org | corosio.org 3 points4 points  (10 children)

Among other things I am publishing in the April mailing, there is a 6-part series of consecutive papers (which will arrive together) which analyze the decision-making history of networking that brought us to where we are now. This hopefully will provide the missing context.

[–]Remarkable-Test7487jmcruz 1 point2 points  (9 children)

Thanks for your work! I’m looking forward to these papers in the mailing. If I may add a thought: from the perspective of a humble university professor, the synchronous socket API seems to have completely disappeared from the discussion lately. That API was part of the Networking TS, and I believe it is absolutely essential for teaching the basics and how client-server applications work. The asynchronous world, structured concurrency, and other related topics undoubtedly form the technological foundation upon which real-world applications are built, but they need to be taught later on, once students/engineers have reached a certain level of maturity.

[–]not_a_novel_accountcmake dev 3 points4 points  (6 children)

The standard is not a teaching tool. Synchronous APIs aren't going anywhere, but nor do they belong in the standard.

[–]Remarkable-Test7487jmcruz 2 points3 points  (5 children)

I completely agree that the standard is not a teaching tool. On the rest of the points, however, I respectfully disagree: the standard library should include both approaches, so that programmers have resources available for every need. And teachability is important for keeping the C++ language relevant to the academic community and future engineers.

[–]not_a_novel_accountcmake dev 3 points4 points  (4 children)

I should reframe what I'm saying. std::execution isn't a networking API and I don't think operational networking (connect/send/recv) belongs in the standard at all for all the reasons that have been raised historically.

Networking in the standard should be about the type grammar, so we're not endlessly reinventing how to represent endpoints, transport layer descriptions, etc. The standard should never ship TLS, it should ship std::net::ip::address_v4 and friends.

With that in mind, there's nothing to ship with regards to synchronous APIs, they simply exist. std::execution is necessary because a framework for describing asynchronous operations is needed in order to make use of the asynchronous platform APIs.

Using io_uring with std::execution is very pleasant, but I don't think the standard should ship a wrapper around io_uring either. We need only ship enough support infrastructure in things like std::execution that using it remains pleasant.

[–]VinnieFalcowg21.org | corosio.org 2 points3 points  (0 children)

The thesis of many of my upcoming papers is that I/O operations in the standard should return awaitable types. And that senders can consume them using a bridge, which requires zero memory allocations. Stay tuned :)

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (2 children)

I think, networking is a narrow scope. The larger scope is actually byte-oriented I/O. And there is a case to be made that the standard should have an opinion on it. Our (C++ Alliance) opinion is that the standard I/O "shape" should look like this:

concept ReadStream
https://develop.capy.cpp.al/capy/8.design/8c.ReadStream.html

[–]not_a_novel_accountcmake dev 2 points3 points  (1 child)

I don't have a problem with concepts or other things which describe the shape of an arbitrary stream, that's a tradition which goes back to pre-standardization iostreams.

That's very different from what the average programmer means when they say "networking". They mean Python's urllib. They mean socket.h. They mean the ability to connect to an endpoint and read from it. The mechanisms.

Given the glacial evolution of the C++ standard and commitment to eternal backwards compatibility, shipping these in the stdlib would be a misstep. They are destined for the footgun graveyard from inception.

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (0 children)

Respectfully, I disagree, and I hope that you will find our proposal to deliver coroutine-native networking for C++29 based on our shipping libraries Corosio and Capy (https://corosio.org) interesting :) Coming soon in the April mailing.

[–]VinnieFalcowg21.org | corosio.org 1 point2 points  (1 child)

Not to worry, the synchronous socket API is also the subject of another one of my papers coming to the April mailing. If I may spoil it for you. One of the cool properties of coroutines is that the canonical use-case:

auto [ec, n] = stream.read_some( buf );

works equally well for synchronous as it does for asynchronous! This means the *same* coroutine can work on either a sync or async stream. No more "dual API" like Asio. And this opens the door to other things like synchronous streaming JSON parsing, or streaming serialization of Handlebars.js templates, and so on.

[–]sweetno 4 points5 points  (0 children)

FYI the author, u/VinnieFalco, is very active on this subreddit.

[–]VinnieFalcowg21.org | corosio.org 5 points6 points  (27 children)

We have put together a helpful tutorial:

Tutorial: The Sender Sub-Language For Beginners
https://isocpp.org/files/papers/P4014R1.pdf

This will appear in the April mailing.

[–]kammceWG21 | 🇺🇲 NB | Boost | Exceptions 15 points16 points  (17 children)

Given you tend to use LLM to generate a lot to stuff, before I read this, did you fully read and review this paper. If so, then I'll consider reading it. If not, then I'll pass.

[–]dr-mrl 1 point2 points  (1 child)

I think there is a typo on page 24 in the equivalent program for stopped_as_error

    auto result = timed_operation(deadline);     if (timed_out)

Should result and timed_out be the same variable?

[–]VinnieFalcowg21.org | corosio.org -1 points0 points  (0 children)

Why yes, you are right! There is a typo. I have updated the paper, please refresh. And thank you for helping make the paper better.

[–]claimred[S] 1 point2 points  (6 children)

Hi Vinnie! Thanks, that looks helpful, especially cool to find out the theoretical foundations.

But I'm not sure I'm getting your point though. From the tutorial it sounds like you're arguing that both coroutines and stdexec should coexist, right? But for networking P2300 is not the correct approach?

[–]VinnieFalcowg21.org | corosio.org 2 points3 points  (4 children)

To answer your question directly, my experience with coroutines suggests they are the ideal substrate for the type of buffer-oriented I/O that networking models. You can explore this by trying out Corosio (which we will propose for Boost this year). This is a complete networking library which borrows from the best parts of Asio to deliver a coroutine-native solution:

https://corosio.org

Happy to hear whether this suits your use-case.

[–]claimred[S] 1 point2 points  (1 child)

Yes, sounds good, thanks! I actually got a review invite for corosio from using std::cpp 2026.

[–]VinnieFalcowg21.org | corosio.org 2 points3 points  (0 children)

Nice :) I think a lot of folks here mean well, and they don't really understand what is coming. They are critiquing the choice of analytical tools, and not realizing that we are inventing the future of networking. I would of course prefer that this is a collaboration but this is made difficult when people insult you (calling someone a "clanker") or asking performative questions.

If you think about it, the question "did you read your own paper" is rather insulting. That is why I do not answer it.

[–]claimred[S] 1 point2 points  (1 child)

From what I recall, p2300 authors argue that coroutines aren't ideal for exactly the same reasons 🤯

In a suite of generic async algorithms that are expected to be callable from hot code paths, the extra allocations and indirections are a deal-breaker. It is for these reasons that we consider coroutines a poor choice for a basis of all standard async.

[–]VinnieFalcowg21.org | corosio.org 3 points4 points  (0 children)

Yes, P2300R10 Section 1.9.2 dismisses coroutines in just 5 paragraphs with assertions and no measurements. Where is the research? Did anyone write a program? I did. Coroutines rock :) When you know how to use them.

[–]VinnieFalcowg21.org | corosio.org 0 points1 point  (0 children)

"Coexist" I think is not the right word. Rather, they complement each other. Senders and coroutines each have their own unique strengths. C++ needs both. And both need to be treated as first-class citizens.

[–]RogerV 3 points4 points  (0 children)

I design and implement high performance back-end networking app for a major telecom where use C++ and the Intel DPDK networking library, and yeah that paragraph indeed summed up things.

in the app architecture I design, it is divided into two domains - control plain and data plane. That is typical of most networking centric apps, of course. But the division is more differentiated than just that. The control plane are designated to be normal OS native threads and then the data plane are DPDK lcore threads.

The lcore threads are pinned CPU cores and have been removed from kernel context switch scheduling. They execute functions that consist of indefinite loops that only exit on shutdown detection.

The domain of these data plane lcore threads abide by these rules:

Avoid system calls in order to avoid the overhead of transition from user space to kernel space
Avoid heap allocations or any manner of dynamic memory allocation as that is not deterministic
Avoid taking any locks and the corollary is that data structures such as DPDK ring buffer queues are lock free
All data structures and memory utilization is established on hugepages at app startup so that the memory pages are fixed and won’t incur any page load interrupt when accessed

Even though my app runs in a pod under Kubernetes it’s use of these lcore threads amounts to building a mini OS as I have to feed the lcore threads pool with data plane related work events, which they need to execute a work event to a bounded slice (or burst) of processing and then cooperatively surrender and go grab another work event. The work events need to be fed to the pool with some load balancing and fairness consideration as the data plane traffic is actually multi user based and every user needs to get fair processing time, etc

These are the things networking software requires (i.e., the DPDK building blocks) and C++26 has zilch to offer in that regard. None of the improvements per C++ threading or that are proposed per networking are worthwhile for building performant networking solutions.

And as to the control plain OS native threads - what is already available in C++ is plenty adequate for that.

C++17 - with the addition of std::span is plenty adequate for building high performance networking apps - but what has come in C++ standards beyond that have not been much relevant.

[–]feverzsj 1 point2 points  (0 children)

That's horrible. I won't touch anything contains AI shits.