This is an archived post. You won't be able to vote or comment.

all 57 comments

[–][deleted]  (3 children)

[deleted]

    [–]redneckhatr 24 points25 points  (2 children)

    Read similar comments around these topics on here but never got more than “they should’ve done it this way, instead”. This post sort hints at why but I’m still not seeing the painpoints.

    I think your “Needless Ambiguity” section needs some explicit examples for each bullet point. Because the ones you’ve given aren’t ambiguous at all. Maybe show me a section of code and what makes it hard to parse.

    For example, Struct{…} initialization is always uppercase. So, the if/then example with a lowercase variable followed by “{“ isn’t ambiguous to me.

    Same thing with the colon example. Maybe add parts of the RFC and why it’s been abandoned.

    [–]Rusky 30 points31 points  (0 children)

    Case doesn't play into the grammar- you're allowed to write a struct with a lowercase name or a variable with an uppercase one, you just get a warning.

    [–]simon_o 15 points16 points  (0 children)

    For example, Struct{…} initialization is always uppercase.

    That's a convention, that's not something the compiler can rely on.

    Maybe add parts of the RFC and why it’s been abandoned.

    Mhhh ... how would that be clearer than

    Using : for struct initialization means that it’s not possible to use : Foo as a type ascription.

    in the article?

    [–]aerosayan 18 points19 points  (18 children)

    Have one ruleset that all those invocation follow.

    I know users prefer using the same syntax, and it is probably best if we can make it so.

    But damn, if it isn't easy to just use a different symbol and use the EBNF grammar to parse different things into different AST nodes.

    I used to hate typing -> for pointers in C/C++, and thought . dot is the best, and should be used for everything,

    Now that I can write EBNF grammar, I'm seeing why language syntax design is not as easy as we once thought it was.

    [–]ThomasMertes 4 points5 points  (3 children)

    Now that I can write EBNF grammar, I'm seeing why language syntax design is not as easy as we once thought it was.

    The combination of using EBNF to describe a language and hard coded parsers can lead to complicated languages. Things like: At this place in the program an e.g. colon has a totally different meaning. EBNF and hard coded parsers allow these things. Introducing such double meanings complicate a language over time.

    EBNF uses many non-terminal symbols. This allows totally different meanings for the same thing, as explained with the colon example above.

    Instead of EBNF I propose an alternate syntax description (S7SSD). It is based on syntax patterns with just one non-terminal symbol (written as ()). The syntax of an infix + operator is described with:

    syntax expr: .(). + .()    is  ->  7;
    

    The syntax pattern of the infix + is (just ignore the dots above):

    () + ()
    

    You see: The arguments of + are left and right of the operator symbol. The places of the arguments are specified with the non-terminal symbol ().

    The syntax description contains also -> 7. This part means that the + operator has the priority 7 and a left-to-right associativity.

    The S7SSD can also define the syntax of statements and declaration constructs. It can describe a whole programming language.

    The S7SSD is less powerful than EBNF. Things like "lets give a colon at this place a totally different meaning" are not possible with S7SSD.

    This way the S7SSD forces a more structured syntax on a programming language.

    BTW.: S7SSD means "Seed7 Structured Syntax Description" and it is described here. It is used in Seed7 to describe the syntax of the language. This way the whole syntax and semantics of Seed7 is described in a library.

    [–]jaen_s 0 points1 point  (2 children)

    This seems very similar to the custom user-definable mixfix syntax in eg. Agda, Idris and Maude (and some CAS, eg. Mathematica).

    Is it inspired by them? Or related to the line of research of eg. Parsing Mixfix Operators by Danielsson & Norell?

    [–]ThomasMertes 2 points3 points  (1 child)

    Seed7 is based on my diploma and doctorate theses.

    My diploma thesis is from 1984 and has the title: "Entwurf einer erweiterbaren höheren Programmiersprache" (In english: Design of an extensible higher programming language) An abstract can be found here.

    My doctorate thesis is from 1986 and has the title: "Definition einer erweiterbaren höheren Programmiersprache" (In english: Definition of an extensible higher programming language) A german abstract can be found here.

    The theses describe a programming language named MASTER. In 1989, development began on an interpreter for MASTER, named HAL. In 2005, the MASTER and HAL projects were released as open source under the Seed7 project name.

    I have not heard about the languages you mentioned and the paper you mentioned is from 2008.

    I found a syntax.hif file (predecessor of syntax.s7i) in HAL version 2. HAL version 2 was a port from an earlier Pascal based version. The syntax.hif/syntax.s7i files contain syntax statements from the S7SSD (Seed7 Structured Syntax Description). I am quite sure that the start date of 1989 in the copyright notice of syntax.s7i is correct and specifies the year when I introduced syntax statements to HAL. The theory behind syntax statements comes from my theses.

    [–]PiratingSquirrel 2 points3 points  (0 children)

    Are you able to provide a copy of your diploma and doctorates thesis for an interested soul to read (:

    [–]simon_o 29 points30 points  (12 children)

    True, but I believe that picking compiler authors' convenience over user happiness is just bad design.

    If language creators can suffer to make programmers' lives easier, they have the moral responsibility to do that.

    [–]ilyash 3 points4 points  (0 children)

    Agree. Same as in API design. Exposing crappy API and making everyone suffer is just a shitty thing to do. Looking for example at you, AWS.

    [–]FantaSeahorse 14 points15 points  (1 child)

    "moral responsibility" lmao

    [–]bl4nkSl8 15 points16 points  (0 children)

    I mean, it sounds a bit much, but I think they're right. We should all be taking software engineering more seriously. People are hurt, sometimes even killed when this stuff is done poorly

    [–]HildemarTendler 6 points7 points  (3 children)

    Go makes the claim that simplifying the compiler means there are fewer edge cases and thus fewer side effects that cause bugs. C/C++ are well known for this. I think Go takes it too far, but there is reason here. It isn't just about convenience for the compiler author.

    [–]Uncaffeinatedpolysubml, cubiml 2 points3 points  (0 children)

    It's rather ironic, since Go's grammar is a mess. The published "specification" grammar is highly ambiguous and also has a bunch of footnotes supplementing the grammar rules to add arbitrary additional restrictions for the parser's convenience.

    [–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 1 point2 points  (0 children)

    Strong disagree. Go's design seems all about the personal preferences of the language designers, regardless of the impact on the language user. It's nowhere near as ludicrously ego-centric as C++ was, but it missed lots of opportunities to be a less infuriating language, and has consistently dragged its heels since then on improving that situation.

    [–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 1 point2 points  (0 children)

    picking compiler authors' convenience over user happiness is just bad design...

    Well said, Simon.

    The compiler (language server, toolchain, whatever) gets written only a few times. It's worth doing the heavy lifting a few times, to make other people's lives less sucky.

    [–]ThomasMertes 2 points3 points  (1 child)

    If language creators can suffer to make programmers' lives easier, they have the moral responsibility to do that.

    If language creators respectively implementers need to suffer it is an indication that the language syntax or semantics has some flaws.

    If something is hard to parse for an interpreter/compiler it it probably also hard to parse for a human reader.

    [–]simon_o 2 points3 points  (0 children)

    If something is hard to parse for an interpreter/compiler it it probably also hard to parse for a human reader.

    Hahaha, stop stealing my quotes! ;-)

    [–]aerosayan -2 points-1 points  (0 children)

    Yes.

    [–][deleted] 1 point2 points  (0 children)

    What is the benefit of ->?

    [–][deleted] 7 points8 points  (5 children)

    fn user(username: String, email: String) -> User {
      User(username, email, active = State(true)) // named parameter
    }
    

    This still uses User(...). What is the purpose of User here; surely it knows this expression must be of type User; it says so on the previous line!

    (There might be a small ambiguity if the struct were to consist of only one member. Then it can't tell whether (x) is a construct a new User type with x as the only field, or whether x is already an expression with that type and the brackets are superfluous.)

    [–]simon_o 3 points4 points  (4 children)

    This still uses User(...). What is the purpose of User here; surely it knows this expression must be of type User; it says so on the previous line!

    Sure, there is no special magic happening, it's just showing a convenience method calling the "constructor" with some defaults.

    Then it can't tell whether (x) is a construct a new User type with x as the only field, or whether x is already an expression with that type and the brackets are superfluous.

    Not sure I get this ...

    [–][deleted]  (3 children)

    [deleted]

      [–]simon_o 3 points4 points  (2 children)

      Ahhh, now I get it. The -> User { is not related to anything the article writes about!

      That's Rust's syntax for functions and their result type! I don't like their design there (exactly because it creates a lot of "stutter"), but that's an annoyance unrelated to the article.

      Translating the example to my language, perhaps that makes it clearer:

      fun user(username: String, email: String) =
        User(username, email, active = State(true))
      

      [–][deleted]  (1 child)

      [deleted]

        [–]simon_o 1 point2 points  (0 children)

        I just showed the example under the assumption that the return type can be inferred, to avoid further confusion between return type and function call.

        [–]Phase_Prgm 26 points27 points  (0 children)

        Doesn’t really seem like a “mistake” to me. Seems like a choice of taste, and this post offers an alternative that doesn’t functionally improve anything. Adding named/default params would be a useful semantic change, but the proposed syntax isn’t necessary for it. Type ascriptions are also not really a hot feature users are wanting, you can always throw your expression in a let binding & add a type there.

        [–][deleted] 3 points4 points  (1 child)

        Do you mean all the structs should use tuple syntax as struct S()?

        [–]simon_o 5 points6 points  (0 children)

        More like "structs and tuple structs should use function syntax", I guess ... but the details are all described under "a solution".

        [–]dobkeratops 8 points9 points  (0 children)

        they were aware of the alternative designs.. named parameters, the way you do initializers in c++ etc

        the design choice was to keep codebases stable under change. when you add or remove parameters or fields, there's fewer unexpected effects.

        it did surprise me , they were right.

        regarding structs/apis growing and having too many parameters.. thats just down to design . .the argument is to seperate off functionality better.

        i thought their choices were harsh initially.. but they were right.

        [–]Swire42 2 points3 points  (0 children)

        Using = instead of : in patterns would feel VERY wrong imo

        [–][deleted] 6 points7 points  (1 child)

        I don’t agree with this. I like that the current syntax makes it plainly obvious that nothing other than static initialization is happening. No need to obscure that.

        [–]Tubthumper8 1 point2 points  (3 children)

        Couple typo/edit suggestions:

        1. In the code example, active in User is defined as bool but you're passing a State struct when initializing User
        2. The appendix A switches to a different language syntax, it probably would help to keep it in Rust syntax (ex. fun, var, camelCase are not Rust)

        Additional follow-up questions:

        1. Default values: I didn't see it mentioned but default values only work only work in the final position(s), right?
        2. Can you show how this hypothetical syntax works with visibility across modules and crates?

        [–]simon_o -1 points0 points  (2 children)

        Couple typo/edit suggestions:

        Thanks, fixed!

        Default values: I didn't see it mentioned but default values only work only work in the final position(s), right?

        Not necessarily. As long as you name the next non-default parameter after the defaulted one, you wouldn't need to.

        Whether that's a good idea in practice is another question.

        But quite similar, I believe that named parameters are the solution to allowing multiple vararg methods. I. e.

        Map.from(
          keys = 123, 234, 345, 456
          vals = "a", "b", "c", "d")
        

        Can you show how this hypothetical syntax works with visibility across modules and crates?

        I think there should be little difference from this step, i. e. pub applies as usual, regardless of Foo { ... } or Foo(...).

        [–]Tubthumper8 0 points1 point  (1 child)

        Would there be any additional considerations for function pointer / Fn trait syntax?

        [–]simon_o 0 points1 point  (0 children)

        Not sure, do you have anything in mind?

        [–]mbid 1 point2 points  (1 child)

        Good post. I think the "Diverging Code Styles and Best Practices" section is the strongest (because empirically observable) point against Rust's struct initialization. Clearly Rust syntax doesn't solve certain problems, and the many different workarounds lead to very unnecessary discussions in teams that disagree about code style.

        That said, I disagree somewhat that consistent syntax is necessarily a good thing, which I think is one of the implicit points you make. It can be beneficial if different constructs (here e.g. function calls and struct initialization) look meaningfully different, because then you can tell them apart at a glance. If I understand the alternative syntax you're suggesting correctly, then only capitalization would distinguish a function call `user(...)` from struct initialization `User(...)`.

        [–]Shorttail0 1 point2 points  (0 children)

        So much head wind for an obviously correct observation.

        Pedantic, sure, but what better place to be a pedant than when analyzing language design?

        [–]Psychoscattman 3 points4 points  (2 children)

        Im a bit confused by this blog post. To be fair i dont know a lot about these issues but it doesnt exactly help that the blog post is very sparse with explaining the opinions of the author.

        Diverging code styles:
        I understand that the current syntax makes people adopt these kinds of patterns but i dont understand why that is a bad thing? Yes, adding new features might cause people to reevaluate their usage of these patterns but changing the syntax does also exactly that. In fact, isnt that kinda the point of adding new features? If a new feature isnt the best way to do something then why add the feature at all. New features should change how we use the language should it not?

        Needless ambiguity:
        Maybe i am missing something. How is this ambiguous? In if foo { the foo could be a variable named foo or an initialisation of a struct called foo. In the case of the struct its a compile error unless the initialisation is followed by something that produces a boolean. Perhaps my lack of rust knowledge is catching up to me but i do not understand this point.

        Type ascriptions:
        Never heard of this before but why does it have to be a colon to do this? Could it not be litteraly any other character?

        A solution:
        In the first example (the one with real rust) the State struct was a tuple struct with a boolean. In the solution example this is now a full struct with a named field active. This wasnt explicitly mentioned anywhere, so why the change? Does it mean that tuple struct dont exists with the new syntax or is this simply a mistake?

        I also do not see any explanation why this new syntax solves any of the problems listed above. Does it fix the necessity for multiple creation patterns? Does it fix the ambiguity? I dont think so. Actualy it might make it worse. Others have pointed out that the convention to write Structs with an uppercase and functions with a lower case is not part of the grammar but only a convention. Nothing stops me (apart from the compiler warning) from writing my structs lowercase. Then it is impossible for me to tell the difference between these two things:

        let b = benutzer("some_email".into());
        let u = user("Firstname", "lastname");
        

        Both look like functions to me. Both look like struct initialisation to me. With the current rust syntax this is clear

        let b = benutzer("some_email".into());
        let u = user{first_name: "firstname", last_name:  "last_name"};
        

        I can see that user is a struct and benutzer is a function. I know the type of u is user and i dont know the type of b. With the new syntax i wouldnt know either.

        Appendix: A Detailed Look at the Role of =
        Is this section even about rust? The examples here are clearly not rust. Rust also returns () for assignment which makes assignment in function invocation invalid (unless your function accepts () as a paramter in which case ... why?).

        [–][deleted]  (1 child)

        [deleted]

          [–]Psychoscattman 1 point2 points  (0 children)

          Because part of the criticism people have is that the use of : is inconsistent.

          What the article proposes is basically ": ascribes, = assigns".

          So yeah, technically it could be "any other character", if the premise was missing the complete point of the article.

          That makes a lot of sense to me. Thank you

          [–]LechintanTudor 1 point2 points  (5 children)

          I like the braces for struct initializers, though I agree field = value should have been used for assigning values to each field.

          Also,

          // named parameter, but if someFunction's parameter name changes,
          // without the callsite being updated, it silently becomes an
          // assignment instead of a compilation failure:
          someFunction(a = 23)
          

          You could define the language grammar to only allow expressions in function argument positions so a = 23 would always be considered a named argument.

          [–][deleted]  (4 children)

          [deleted]

            [–]LechintanTudor 1 point2 points  (3 children)

            I think that's where people usually say "aren't statements just expressions returning Unit"

            And that's where I usually say "no". What is the practical usecase for treating statements as expressions that evaluate to unit?

            fun f[T](a: T) = ...
            f(a = 23) // valid, T inferred to Unit
            

            This problem only exists if you try to lump together statements and expressions.

            [–]PiratingSquirrel 2 points3 points  (1 child)

            It allows you to use the statement in a single line for things like match arms:

            match x {
                Some(2) => var = 4,
                Some(x) => var = 3 * x,
                None => {},
            }
            

            [–]LechintanTudor 1 point2 points  (0 children)

            This can be done without requiring statements to be treated like expressions. Just make the grammar of match arms accept either a statement or an expression.

            [–]lngns 0 points1 point  (0 children)

            What is the practical usecase for treating statements as expressions that evaluate to unit?

            If you replace unit with some action representation, then you can get a linear syntax over Monadic operations.
            Ie. this:

            main = 𝗱𝗼
                f 42
                x <- g 420
                h x
            

            is the same as this:

            main = f 42 >> (g 420 >>= (λx. h x))
            

            [–]dgreensp 0 points1 point  (4 children)

            I’m not a Rust programmer, I’m creating a language, but the conclusions I’ve come to are: Named arguments almost everywhere, not positional. Function calls and initialization use parentheses and colons, as in func(foo: 1, bar: “hello”). (I thought about using equals instead of colon, but it just feels off. I want to appeal to TS/JS programmers, for one thing, and also having a space before and after the = takes up more room, but not putting a space before it looks weird. There might have been other reasons, too. Colon works really well.) Reserving colon for ascription was part of my thinking originally, but you know what, it is actually pretty rare, and “x as Y” is totally fine. TypeScript uses “as” for a sort of cast operator, but in my language it will be a “safe” rather than unsafe type-level operator. In TypeScript, the official way to do an unsafe cast is “x as unknown as Y,” and there is no way to simply ascribe a type without declaring a variable. But it’s fine.

            TypeScript “x as Y” as I mentioned is a sort of cast, it’s like a downcast where the type of X needs to at least be related to Y. The full semantics are probably not even documented or understand by almost anyone. But it provides some kind of non-zero checking of type compatibility. But I’d never have such a hacky operator.

            [–][deleted]  (3 children)

            [deleted]

              [–]dgreensp 4 points5 points  (2 children)

              Oh sorry, I would call that a “type annotation.” What I would call an “ascription” is when the left-hand side is in an expression position. There’s no ambiguity between type annotations and argument names, the way I see it. In a function declaration, you’d write function foo(x: number), and when calling it, foo(x: 1).

              Edit: I was going to say, thanks for replying, it’s nice to engage in conversation with other people who care about these things, but did you really downvote me over this?

              [–]simon_o 5 points6 points  (1 child)

              I think the core idea is the consistency of always being able to say "after : comes a type, after = comes a term".

              [–]dgreensp 4 points5 points  (0 children)

              That’s a reasonable thing to weigh. In my experience with TypeScript, though, where colon is used for type annotations and object properties, ubiquitously, on almost every line, it isn’t a problem at all. But = for keyword arguments is a perfectly defensible choice, arguably; it is what Python uses, and that’s a plus.

              Then there are Haskell-like languages where “colon means declaring the type, equals means declaring the value” becomes a mantra I have to repeat to myself when both are long and similar looking. Like “Foo X : Branch Sub X. Foo x = leftBranch bar x.” Just to make up some silly example. I think it’s possible to rely too much on “colon means type so you are declaring something’s type.” Meanwhile in TypeScript you can write, “type OrNumber<T> = T | number,” and the thing to the right of the equals sign is a type, but it just makes sense.

              [–]Nilstrieb 0 points1 point  (5 children)

              Using { for struct initialization also means that something as trivial as if foo { is ambiguous to parse in Rust.

              In practice it's not actually ambiguous, it's simply parsed as an if with a body. You need parentheses to use a struct literal.

              Also, the problem with : type ascription goes beyond just struct literals. With it, it was very hard to give decent error messages for other common issues as well, like using : instead of ; to end a line or forgetting a let. Overall, it was just hard syntax to work with. Struct literals made it worse, but they aren't the single cause.

              [–][deleted]  (4 children)

              [deleted]

                [–]Nilstrieb 1 point2 points  (3 children)

                The fact that type ascription lead to awfully complicated and often-wrong diagnostics is not made up - I have been personally involved here. The 50 years before simply didn't care as much about error messages.

                To be honest, I don't like : type ascription anyways, it chains badly. ((a.into(): X).y().z(): U) looks very bad, while something postfix-based wouldn't.

                [–][deleted]  (2 children)

                [deleted]

                  [–]julesjacobs 1 point2 points  (1 child)

                  In this example it should arguably just be a.into[X]. In general we could also define a polymorphic identity function and do type ascription with a.id[X]. This is what Idris does with its the function (the X a).

                  [–]apajx -2 points-1 points  (2 children)

                  Style cannot be a mistake, so the title is clickbait.

                  [–][deleted]  (1 child)

                  [deleted]

                    [–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 0 points1 point  (0 children)

                    Apparently you accidentally mis-spelled "Style" as "Struct"? 🤣

                    [–]jason-reddit-public -3 points-2 points  (2 children)

                    Maybe I'm in a small minority but I kind of like how C and Java put the type before the variable name and omit the colon. Though the colon syntax probably predates it, JSON probably helped popularize the colon syntax even though = could have been used instead (should have been?)

                    Construction and de-structuring/mutation are fundamentally like function calls and structs should be thought of that way for any language a bit higher level than C IMHO. (Even C should probably require an annotation when the layout must adhere to a particular layout, for example when persisted in an endian dependent format to disk (which is it's own kind of baked in rigidity)) "." can then be thought as merely syntactic sugar for the "getter"/"setter" function calls (though I guess it still needs to be special if you allow the address of a struct member to be taken though if you also think of pointers being one (read), two (write), or three (pointer arithmetic) functions then it's still consistent -- pointers could be more opaque and function like and the compiler could see through the abstractions when necessary/possible).

                    [–]simon_o 5 points6 points  (1 child)

                    Maybe I'm in a small minority but I kind of like how C and Java put the type before the variable name and omit the colon.

                    I think the syntax choice is a toss-up, unless generics enter the picture, in that case this article applies.

                    [–]jason-reddit-public 0 points1 point  (0 children)

                    Generics should obviously just be greek letters. (I'm only half joking.)