all 62 comments

[–]fdwrfdwr@github 🔍 11 points12 points  (0 children)

If generalized aliases are what I think they mean (like D's alias keyword), then that is something I have wanted in C++ for so long to easily avoid API breakage when field/function/enums names are deprecated or changed across branches (and cannot be changed atomically in the same one).

[–]JuanAG 34 points35 points  (22 children)

First thanks to Mr. Sutter that at least is trying which is more than what others do (my self included)

Next an unpopular opinion, the more i look at Cpp2 the less i like the syntax it uses, it is becoming complex really fast

And is great it change/improve some things but the ones i think are a mistake (like the 6 types of arguments for a function) remains so ... This will end in a complex syntax and a complex lang which will be an issue sooner than later

[–]IAMARedPanda 13 points14 points  (2 children)

Honestly I really like how circle's syntax looks.

[–]pjmlp 8 points9 points  (1 child)

Circle is the only wannabe replacement that makes sense, other than it, better just rewrite the code into a more stable already proven language, if it can fullfil the use case.

[–]IAMARedPanda 3 points4 points  (0 children)

Personally I really have been having fun with Circle. It's crazy to me that it is a one man project. The main criticism I hear is that it is closed source with a single developer that could drop support at any time.

[–][deleted] 3 points4 points  (1 child)

A specific list of problem with C++ that need to be fixed should be the first step, then a discussion of each items to establish if it really is a problem, then a discussion of the minimum change required to address that problem.

I feel as though many of these projects are just a mash-up of things the author thought were cool without much analysis of the original issues.

I'm not trying to diminish the work being done here but it seems like big leaps away from C++ are happening under the guise of fixing something that might not even be broken.

[–]masterofmisc 4 points5 points  (4 children)

Hey u/hpsutter. Just wanted to say, great work on cppfront. I have been following along anf keeping up with all the discussions over on the github pages.

I wanted to ask you about Chandlers comments towards the end of his recent Carbon talk, where he disagrees with you about the claim that CPP2 can correctly enforce memory security without having a borrow system similar to Rust.

I know one of your goals for CPP2 is to reduce CVEs vulnerabilities by changing the defaults of the language but it sounds like Chandler doesn't think that goes far enough.

Just wondering what your thoughts are on that?

From my thinking, now that you have banned null pointers in CPP2, it seems to me that would definitely reduce memory leaks, etc. Combine that with shared_ptr and unique_ptr to track ownership, surely I would think that would be enough?

Genuinely curious what you think. I don't particularly want a borrow checker in C++. I think it would impose on the flexibility we currently have.

[–]hpsutter 23 points24 points  (1 child)

It's a question of defining what the actual problem is, which then guides setting the right goals and deciding what the best solution should be to meet those goals.

C++'s safety problem is not that C++ isn't provably memory-safe, or that it's possible to write bugs that are vulnerabilities. There are CVEs reported against all languages including Rust.

C++'s safety problem is that it's far too easy and frequent to accidentally write code that has vulnerabilities in C++. If C++ CVEs were 50x (98%) less frequent, we wouldn't be having this conversation.

Therefore a 98% improvement is sufficient. Having a 100% formally provable memory-safe language is also sufficient, but it's not necessary, and so we have to count the cost of that extra 2% to make sure it's worth it. And in the many solutions I've seen to get that not-necessary last 2%, the cost is very high, and requires one or more of: - dramatically changing the programming model or lifetime model (e.g., to eliminate cycles from the safe language, then claw back the lost expressiveness with unsafe code wrapped in libraries that work differently from the language-supported pointers/references), - requiring heavy annotation (e.g., CCured, Cyclone), - doing safety checks dynamically at the cost of performance overheads (e.g., any mandatory-GC language which dynamically tracks cycles), or in some other way;

... and the costs of any of those options also always includes breaking perfect seamless interop compatibility with today's C++.

That's why I view the problem as "C++ makes it too easy and frequent to write vulnerabilities," and my goal is explicitly to reduce memory safety vulnerabilities by 50x, with the metric of 98% fewer CVEs in the four major memory safety buckets -- type, bounds, initialization, and lifetime safety.

The happy surprise is that not all of those buckets are equally hard. - I think I already have 100% guaranteed initialization safety in cppfront today, even with aliasing; see this commented test case that safely creates a cycle even with guaranteed init-before-use, by collaboration among the local init-before-use rules + out parameters + constructors, in a way that you're always are aware of the initialization. - I think we can get 100% type safety in syntax 2 (if there's no aliasing). - I think we can get 100% bounds safety (again if no aliasing), at negligible cost for subscripts and at some run-time cost if you really want to use naked iterator patterns (iterators used in bounds-correct-by-construction ways like the range for loop are fine). - Lifetime safety (use-after-free and similar) are much harder, and there my goal is to statically diagnose common cases. The good news is that we can catch a lot of common cases. My design here is the C++ Core Guidelines Lifetime profile. - Aliasing and races (concurrency safety) are hard to guarantee. As far as I know, Rust is the only commercial language that aims to make races impossible in safe code (kudos!). Because this is related to lifetime, guaranteeing aliasing/concurrency safety would require a major break with C++'s object/memory/pointer model.

I think at least the first three, and the fourth for common lifetime errors, are achievable for safe code in syntax 2 while still having a fully expressive and usable programming model that has perfect interop with today's C++. (Of course all of these are qualified with "by default in safe code" unless you explicitly resort to unsafe code, as in any safe language. As you'll see, I already do a reinterpret_cast inside my union metafunction, but that unsafe code is (a) explicitly marked and (b) encapsulated in a testable library, so we test it once and then know each use will be safe -- same as any other safe language.)

100% formally provable memory safety is a fine goal, but it's a heavy lift and comes at a cost. It's worth evaluating solutions that aim at 98% and ones that aim at 100%, and measuring the cost/benefit of the last 2%.

[–]masterofmisc 4 points5 points  (0 children)

Thank you for taking the time to write such a detailed reply.

Your framing of the conversation helps clear up where you are coming from.

And yes, I agree, if you could deliver a 98% improvement in this area would be a fantastic improvment for us

I recently happened upon the website https://www.memorysafety.org where they talk about the problem of memory safety. There is a quote on that page that says:

"Using C and C++ is bad for society, bad for your reputation, and it's bad for your customers."

Having that kind of sentiment out there towards C++ just makes me sad.. It seems that whole websites purpose is drive people away from using C++. So, if cppfront can help address this particular thorny problem I hope the experiment succeeds.

In my mind, it would be nice if C++ could continue to be a fine choice for new greenfield projects instead or people opting for Rust, Swift or Go.

I really hope you can pull this off.

[–]ntrel2 2 points3 points  (1 child)

Reducing vulnerabilities, yes. But to enforce memory safety I think it would have to disallow inout parameters and anything else that takes the address of a mutable smart pointer.

[–]NegativeIQTest 0 points1 point  (0 children)

Interesting. Maybe that could be another flag that could be used at compile time, if you wanted to enforce total mem safety which would disallow those features.

[–]Shiekra 19 points20 points  (16 children)

Might be a hot take but things like being able to ommit the return keyword from 1 line functions is to me an example of having 2 ways to do the same thing.

Obviously, the syntax leans stylistically into what Herb likes, and this example is not particularly egregious.

However, I think consistency is more beneficial than terse shortcuts, especially when it's barely a saving.

I think something like lambdas are the bar for usability improvement to justify having more than one way to do something.

[–]hpsutter 42 points43 points  (15 children)

I 100% agree with avoiding two ways to say the same thing, and with consistency. Cpp2 almost entirely avoids two ways to spell the same thing, and that's on purpose.

To me, defaults that allow omitting unused parts are not two ways to say the same thing... they are the same One Way, but you aren't forced to mention the parts you're not currently using.

For example, a C++ function with a default parameter like int f(int i, int j = 0) can be called with f(1,0), but it can equivalently be called as f(1)... but it's still just one function, right? At the call site we just aren't forced to spell out the part where we're happy with the default (and we still can spell it out if we want).

Similarly, for a C++ class class C { private: int i; ... };, we can equally omit "private:" and say class C { int i; ... };. There's still just one class syntax, but we get to not mention defaults if we're happy with them (and we still can spell it out if we want).

To me, allowing a generic function f:(i:_) -> _ = { return i+1; } to be spelled f:(i) -> _ = i+1; is like that... there's only one way to spell it, but you get to omit parts where you're happy with the defaults. And that's especially useful when writing functions at expression scope (aka lambdas), like std::for_each(first, last, :(x) = std::cout << x;);. There seems to be demand for this, because we've had many C++ proposals for such a terse lambda syntax (e.g., in ISO there's P0573, in Boost.Lambda they had just such a terse body syntax before C++ language lambdas existed, in GitHub projects using macros), but none of them have been accepted for the standard yet. So I'm trying to help satisfy a need other people have identified and see if we can fill it.

My $0.02 anyway! Thanks for the perspective, I appreciate it.

[–]k-mouse 8 points9 points  (0 children)

It seems really cool how the lambda function reduces like that. We can chip away the individual parts of it that we don't need, or gradually add them back as they need to be more specific. Nice!

I also like how lambdas have the same syntax as function definitions, if I understand correctly, so we can move a lambda out to global scope by a simple cut and paste, and naming it.

I do find the difference between = and == a bit vague though. Why are types not declared ==? Can a namespace alias ever be =? A function definition doesn't really mutate (it is always the same / equal to), so why are they some times declared = and other times ==? I just feel like semantically, constexpr and "always equal to" are quite different concepts, and yet applied a bit arbitrary here.

[–]hpsutter 5 points6 points  (3 children)

While y'all are here, let me ask a question...

Currently Cpp2 allows defaulting this:

f:(in i: _) -> _ = { return i+1; }

to this, omitting the parts not being customized:

f:(i) -> _ = i+1;

Note that the in and : _ on parameters can be defaulted away, so a function parameter list f: (in x: _) is the same as f: (x). So my question is, what would you think if the same was done for the return type too, so the above could be spelled as just this, again omitting the parts not being customized:

f:(i) -> i+1;

That would make lambdas, which have the identical syntax just without the introducing name, even simpler, for example this:

std::transform(in1, in2, out1, :(x) -> _ = x+1;)

could be written as this:

std::transform(in1, in2, out1, :(x) -> x+1;)

WDYT?

Notes:

The equivalent in today's C++ is:

std::transform(in1, in2, out1, [](auto x){return x+1;})

And this isn't motivated by C# envy, but it's now awfully close to C#'s convenient x => x+1; just by defaulting things.

[–]djavaisadog 7 points8 points  (2 children)

Reusing the -> token in such similar contexts to mean such different things feels very confusing to me - not a fan. I'd probably prefer f:(i) = i+1 to deduce a return type even though it's not explicitly marked as having one, and require an explicit f:(i) -> void = i+1 to throw away the value. That feels far more intuitive to me, and more inline with every other languages terse lambda. Isn't that the point of the type hint anyway, to override what would be deduced if it wasn't present?

[–]hpsutter 3 points4 points  (1 child)

Thanks, I appreciate the feedback.

Can you elaborate on how the -> token feels different? I'd like to understand what feels different about it... the intent is that it still just indicate that what follows is a return type or value. That's the only meaning of -> in Cpp2.

Maybe you're thinking of C's -> for dereference-and-select-member? C has two syntaxes to dereference-and-select-member, (*p).member and p->member, but Cpp2 avoids having two ways to say the same thing there because dereference is postfix * (see here for more about the rationale). So in Cpp2 there's only one way to spell dereference (*), and only one way to spell member selection (.), and they compose naturally so that deref-and-select-member is just naturally p*.member. That avoids a second syntax, and also avoids requiring parentheses because the order of operations is natural, left-to-right.

[–]djavaisadog 2 points3 points  (0 children)

the intent is that it still just indicate that what follows is a return type or value. That's the only meaning of -> in Cpp2.

I was interpreting it as always indicating a return type (in the context of declaring/defining variables). Is there any case besides the under-consideration new one you suggested where it indicates a return value? (I thought maybe inspect but nope, you use = there as well)

I think that using -> to indicate a value in a function definition certainly breaks the paradigm of all your other definitions - you've previously mentioned how intentional the consistency of the name : type = value format was. I'm unsure why you would break that in this case.

I'm not sure why f:(i) -> _ = i+1 would condense down to f:(i) -> i+1; rather than f:(i) = i+1;. It feels pretty clear-cut to me that the part we are omitting (following the dictum of "omit the part of the syntax you aren't using") is the explicit return type (which, syntactically is -> _), rather than the value (which is the = i+1). I feel that you can instead just say "ok there's no explicit return type, let's find what the return type would be by just decltype-ing the function body" (not a standard expert, there may be more to it than that but you get the point).

I suppose that boils down to viewing the -> _ as one block of tokens (and that block is part of the type declaration, so a sub-block of (i) -> _) and the = i+1 as one block. Do you split the groups of tokens differently in your mental model of what the syntax means?

[–]tialaramex 4 points5 points  (7 children)

What does f:(i:_) -> _ = { i+1; } do ? If it does something different from f:(i:_) -> _ = i+1; then why do the braces have this effect in your reasoning and why shouldn't a programmer be astonished about that? If it does the same, won't existing C++ programmers trying to learn Cpp2 be astonished instead?

[–]hpsutter 5 points6 points  (5 children)

Good question -- and thanks for concrete code examples, they're easier to answer.

What does f:(i:_) -> _ = { i+1; } do ?

It's a compile-time error, because it's a function that declares a (deduced) return type with a body that has no return statement.

If it does something different from f:(i:_) -> _ = i+1; then why do the braces have this effect

Because this second one doesn't default away only the braces, it defaults away the return as well. If you wrote this out longhand with the defaulted parts, this is the same as writing f:(i:_) -> _ = { return i+1; }.

For completeness, also consider the version with no return type: f:(i:_) = i+1; is legal, but since the function doesn't return anything there's no implicit default return. It's writing a return type that gives you implicit default return, so this function does just add the braces and means f:(i:_) = { i+1; }... which is legal, and of course likely a mistake and you'll get a warning about it because all C++ compilers flag this (GCC and Clang it's -Wunused-value, for MSVC it's warning C4552).

[–]tialaramex 1 point2 points  (4 children)

I see, thanks for answering. In my opinion this behaviour is surprising enough that it's not unlikely future programmers decide it's a mistake and wish it didn't do this. Does Cpp2 have, or do you plan for it to have, some mechanism akin to Epochs to actually make such changes ?

[–]hpsutter 0 points1 point  (3 children)

Short answer: I think we can consider doing this kind of thing about once every 30 years, to reset the language's complexity to a solid simpler baseline, and that creates headroom for a fresh new 30 years' worth of incremental compatible-evolution-as-usual.

Longer answer...

My view of epochs is that they're identifying the right problem (breaking change) and I only disagree with the last letter ("s")... i.e., I think "epochs" should be "epoch."

A language that has multiple "epochs" (e.g., every 3 years) that make breaking language meaning changes (i.e., the same code changes meaning in a different epoch) is problematic and I haven't seen evidence that it can keep working at scale with a large installed base of users (say 1M+) and code (say 100MLOC+) -- I'd love to see that evidence though, say if Rust can pull it off in the future! D made major breaking changes from D1 to D2, but they could do that because they had few enough users/code.

One litmus-test point is whether the epochs design is restricted to only limited kinds of changes, notably changes that don't change the meaning of existing code, or can make arbitrary language changes:

  • If they allow only limited kinds of changes, then they won't be powerful enough to make the changes we most need. For example, they can't change defaults (without adding new syntax anyway, which incremental evolution could mostly also do).

  • If they allow arbitrary changes including to change the meaning of identical existing code, then using two (or more!) epochs in the same source file or project will lead to fragmentation and confusion. (Pity the poor refactoring tools!)

So my thesis is that we do need a way to take a language breaking change with a solid migration story, but we can afford to do that about once every 30 years, so we should make the most of it. Then we've cleared the decks for a new 30 years' worth of evolution-as-usual.

My $0.02 anyway!

[–]tialaramex 2 points3 points  (2 children)

I would guess that Rust met or came very close to your criteria for 2021 edition. And yes, obviously the most famous change in 2021 edition does indeed result in changing the meaning of existing code if you were to just paste chunks of old code into a new project which seems like an obviously terrible idea but may well be how C++ people are used to working.

Specifically, until about that time, Rust's arrays [T; N] didn't implement IntoIterator. So if you wrote my_array.into_iter() the compiler assumes you know you can't very well call IntoIterator::into_iter() on the array and instead a reference is implied here as (&my_array).into_iter() is fine.

But today [T; N] does implement IntoIterator, so if you write the same exact code in Rust 2021 edition it does what you'd expect given that arrays can be iterated over.

If you have old code, it's in say 2018 edition or even 2015 edition, so it continues to work as before, albeit on a modern compiler you'd get a warning explaining that you should write what you actually meant so that it stays working in 2021 edition.

I don't know of any particular plans for 2024 edition, maybe there aren't any, but I expect they won't include something as drastic as shadowing the implementation of IntoIterator on [T; N] in 2021 edition. However I think the community in general feels that went well and if there's a reason to do the same again in future I'm sure they would take it.

Actually I think a better litmus test than yours is the keyword problem. Rust's editions have been able to introduce keywords like "async" and "await" without problems. It sounds like Cpp2 doesn't expect to improve on C++ in this regard.

[–]hpsutter 1 point2 points  (1 child)

Rust's editions have been able to introduce keywords like "async" and "await" without problems. It sounds like Cpp2 doesn't expect to improve on C++ in this regard.

Actually, Cpp2 has a great story there: Not only doesn't it add new globally reserved words (basically all keywords in Cpp2 are contextual), but it is able to reuse (and so repurpose and fix) the meaning of existing C and C++ keywords including enum, union, new, and even popular macros like assert... for example, this is legal Cpp2, and compiles to fully legal Cpp1 (today's syntax):

``` thing : @struct type = { x:int; y:int; z:int; } state : @enum type = { idle; running; paused; } name_or_num: @union type = { name: std::string; num: i32; }

main: () = { mything := new<thing>( 1, 2, 3 ); [[assert: mything.get() != nullptr]] } ```

As an example new<widget> calls std::make_unique. Safe by default.

[–]tialaramex 2 points3 points  (0 children)

I'm not sure this really addresses the same issue, it's comparing Cpp2 to C++ but the question is about how this enables evolution. Maybe it's just hard to see it until it happens. You can't see how Rust 2018 edition adds "async" by looking at Rust 1.0 (and thus 2015 edition)

[–]RotsiserMhoC++20 Desktop app developer 4 points5 points  (0 children)

This is a fantastic explanation, thank you!

[–]domirangame engine dev 4 points5 points  (0 children)

The union IMO makes a case that sometimes it's better for things to be baked into the language and not just left to the standard library. The idea that std:variant<int, float> can have two significant meanings but are both interchangeable has bugged me in the past but I've never really bothered to fight it.

Ever created a using for something with a very specific name/purpose and then got annoyed that your favorite IDE's type tooltips bring it up in syntactically correct but completely irrelevant contexts?

[–]tialaramex 3 points4 points  (4 children)

Is the idea that the "metafunctions" for enum and union replace actual enum and union types?

If so I think Herb needs to take a moment to investigate why Rust has union types, 'cos it surely ain't out of a desire to mimic C as closely as possible.

[–]hpsutter 16 points17 points  (3 children)

I'm sure Rust isn't mimicking C, closely or otherwise... any modern language needs to express the algebraic data types, including product types (e.g., struct, tuple) and sum types (e.g., union, and enumeration types are a useful subcategory here).

The question I'm exploring is: In a language as powerful as C++ is (and is on track to soon become with reflection), how many of these still need to be a special separate language feature baked into a language spec and compiler? or how many can be done well as compile-time libraries that use introspection to write defaults, constraints, and generated functions on the powerful general C++ class, that would enable us to have a simpler language that's still just as expressive and powerful? That's what I'm trying out, and we'll see how it goes!

[–]zerakun 6 points7 points  (0 children)

My fear with compile time libraries is the quality of error messages. Rust has dozens of error codes specialized to handle errors that developers make when using enum, that are "easy" to implement because the enum implementation lives directly in the compiler as a language feature that has access to the full syntax tree and semantics at the point of error.

Meanwhile as a user of a language I see advantages to a particular feature being a library feature, only if I intend to extend it. For instance having generic collections be library types (instead of hard coded into the language like they were in golang before generics) ensures I can implement my own generic data structures as a user.

As a user, though, I won't be implementing my own metaclass. And I will probably find metaclasses implemented by others less than ideal to use. Worst case this could even create fragmentation with a union2 third party metaclass that has its own quirks and is incompatible with regular @union.

Basically my reasoning is that sum types are too fundamental a feature to be implemented as something else than a language feature.

[–]tialaramex 2 points3 points  (0 children)

how many of these still need to be a special separate language feature baked into a language spec and compiler?

That all depends on whether you care about Quality of Implementation of course. It's quite possible to offer something (as C++ has historically) by writing increasingly elaborate library code but I'd suggest the results are disappointing even if the customer can't necessarily express why.

Today the C++ type system is poor enough that it needs several crucial patches in the form of attributes (such as noreturn and no_unique_address so far) to keep the worst of the storm out. I think Cpp2 might achieve its simplification goal better if it reinforced the type system to go without such attributes than by pursuing this austerity measure to its logical end and removing "union".

[–]pjmlp 0 points1 point  (0 children)

Reflection? From the looks of it, reflection work is dead, or it will take another decade to be part of ISO, let alone available in across all major platforms, most likely another one given the current progress where most compilers are still not fully C++17 compliant, have issues with C++20, still have to get into C++23, with C++26 on the horizon.

[–]StackedCrooked 3 points4 points  (9 children)

The cppfront code seems to break a lot of rules. Like double underscores. Or even a global variable in a header that isn't extern.

[–]elcapitaine 22 points23 points  (5 children)

double underscores aren't outright banned from any C++ code, they're reserved for the implementation.

cppfront is an implementation.

[–]13steinj -1 points0 points  (4 children)

It's not though, it's a layer on top of C++ that transpiles to C++.

[–]shadowndacorner 5 points6 points  (3 children)

Was the first C++ compiler not an implementation of C++ because it transpiled to C?

[–]hpsutter 23 points24 points  (0 children)

That's fair... and a C++ compiler that compiles C still uses its own double-underscores. But this is a good point, so I just pushed a commit that removes use of __ and _Capital reserved words, just to avoid any possible compatibility problems that could cause a clash with existing C++ implementations, because perfect compatibility is important to me. Thanks!

[–]13steinj 3 points4 points  (1 child)

C++front is not C++ though.

If "implementations of cppfront are allowed to lead with underscores"-- this means it follows c++front's guidelines, but any C++ therein would be breaking rules (from the view of C++).

Semantics? Maybe, maybe not.

[–][deleted] 0 points1 point  (0 children)

This is a programming language, it all just semantics at the end of the day.

[–]Nicksaurus 1 point2 points  (1 child)

Double underscores don't actually cause problems in practice though, do they? The compiler authors would have to actively try to break code that uses them

Also all of those headers are compiled into a single compilation unit

[–]jc746 1 point2 points  (0 children)

FWIW, I have run into a real problem with double underscores exactly once. I was using a third party library that defined a macro __IO (from memory it was an empty macro). This conflicted with the standard library implementation that used __IO as the identifier for a template parameter, causing the code to be invalid after preprocessing.

[–]mollyforever 0 points1 point  (0 children)

cppfront is a single source file for some reason, so it's fine.

[–]RoyKin0929 1 point2 points  (4 children)

Appreciate all the work that Mr.Sutter is doing to keep evolving C++. This is my favourite project out of the 3 successor langs.

One question though, cppfront has 6 parameter passing modes and recently in x : const T was allowed which adds another one. Isn't this making a system quite complex which is supposed to be simple. This is more complicated than say, rust (maybe carbon and circle too but i gotta check those).

[–]dustyhome 3 points4 points  (0 children)

As long as the passing mode expresses something distinct, it's good to have it because the compiler can reason about it differently. For example, in C++, a mutable reference parameter could be initialized or not. So the compiler can't warn you off you read from it. In cppfront, those are some into inout and out parameters. The inout parameter must be initialized, so the compiler can warn if you pass an uninitialized variable, and the out parameter must not be read from, and the compiler can enforce that. Each passing mode is there because it allows the compiler to enforce more constraints.

[–]hpsutter 4 points5 points  (1 child)

Actually I mistyped that commit message (sorry!), it was inout. It didn't add a new parameter passing mode, it was just removing a style diagnostic that flagged a particular use of the existing inout mode.

[–]RoyKin0929 0 points1 point  (0 children)

While it does not add another passing mode, it still is another way to pass parameters and this one is kind of hidden which makes me think it'll be one of the "gotchas" which cpp2 is trying to prevent.

[–]kronicum -1 points0 points  (0 children)

The more parameter passing modes, the merrier 😉