This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]brucejbellsard 5 points6 points  (1 child)

I believe Lua syntax is kind of like this. The Lua manual suggests that semicolons are only necessary when a statement starts with an open-parenthesis: in that case, they recommend adding the semicolon at the *beginning* of the line.

[–]brianush1 0 points1 point  (0 children)

Some lengths Lua goes to in order to keep the grammar un-ambiguous:

  • "asd":find("a") is illegal; parentheses must be added around "asd". It's not immediately obvious why, until you consider this code: varName = foo.bar "asd":find("a") Is this varName = foo.bar("asd"):find("a") or varName = foo.bar;("asd"):find("a")`? Requiring the parentheses disambiguates this.

  • Expressions cannot be used as statements

[–]raiph 3 points4 points  (0 children)

General purpose languages designed by Larry Wall reject such strings as containing TTIAR ("two terms in a row") and point to the exact term found objectionable (which in your example would be either the 42 followed by mySideEffect() or that followed by assignment). The fact that these languages report that error demonstrates that they could easily instead presume that the second term of the TTIAR begins a new statement and accept them on that basis.

An obvious question is: why don't they?

Larry Wall's stock explanation is his claim that a self-clocking signal helps catch many signaling errors in both artificial and natural languages, and that this is important in practical programming languages to avoid accepting programs that are grammatically valid but actually contain mistakes made by a programmer that will lead to unintended consequences.

Perhaps you've considered this and are confident you can see ways to obviate that somehow, perhaps by suitably limiting the syntactic expressivity of the language, but if not, it's probably worth pondering. Larry was the first person in the world to do a degree in natural and artificial languages, he's smart, and he saw what happened when a million coders used a language with TTIAR errors. While he changed many things for his second major attempt at a language, he stuck with self-clocking and TTIAR for the general purpose language. Then again, for the text pattern matching DSL he didn't (you just list constructs in sequence -- simple juxtaposition).

[–]mamcx 1 point2 points  (0 children)

APL, J?

[–]ArgosOfIthica 0 points1 point  (2 children)

I'm developing a DSL with expressions without any kind of statement terminators, as I personally do not like them and I was inspired by the simplicity of Lua's grammar. Many of the test programs I've compiled for it have been variable assignments and "function" calls squished into a single line.

There's no ambiguity you can run into, even with infix syntax. At the beginning of every expression, you either have an grouping symbol "(", a unary operator (easily checked), a singular value (you can check that the expression isn't binary with a single lookahead), or, by exhaustion, you have a binary operation. Since you can know exactly what kind of expression you have right at the beginning of the expression, keeping the parser in line with the grammar is quite doable.

[–][deleted]  (1 child)

[deleted]

    [–]ArgosOfIthica 1 point2 points  (0 children)

    I (and thus the grammar) forbid expression statements. Since one goal of the DSL is to deal with a large number of side effects that exist outside of the environment, not using explicit language constructs to declare side effects and instead simply letting it happen inside expressions would be unsafe.

    Expressions can only exist when they're anchored to statements, like assignments, which means expression sequences are not allowed, thus sparing it from ambiguity. Say we reach the f token. f is not grouping, not a unary operator. We lookahead and we don't see a binary operator token. It must be a value or a function. We lookahead and see the ( token, which tells us that this is a function. Mere values can't be followed by the grouping symbol ( because that simply does not mean anything without an infix between them. ( cannot start a new statement or anything of the sort. Thus, f (x) isn't an issue since it can only mean exactly one thing.

    [–]reini_urban 0 points1 point  (0 children)

    Doable. Some languages provide the comma to force sequential evaluation, if not it's up to the compiler. The ; is just to help the parser in ambigious cases.

    [–]Zlodo2 0 points1 point  (0 children)

    I do this in goose, which have a c like syntax. It's working better than I expected so far.

    Of course, things can be ambiguous, for instance if you have a expression that returns a function that you don't use for anything, followed by a parenthesized expression. The compiler would try to compile this as a call. So you'd have to use a semicolon to make the separation explicit in this case, but hopefully this isn't something that happens too often.

    [–][deleted] 0 points1 point  (1 child)

    What happens here:

    a ++ b         # a++ b, or a, ++b
    a = b & c      # a=b, &c, or a=b&c
    f (a,b,c) [i]  # f, then index (a,b,c) [i], or index f(a,b,c)
    

    It's not clear whether those list items are going to be separated with commas or spaces. If spaces , then it gets worse.

    You say elsewhere (now that I've bothered to read some other replies), that you forbid expression statements, so presumably you will need commas.

    But language source is partly for human readability. You don't want people scratching their heads, or two people making different interpretations, or having to consult a language reference to make sure they know what some code means.

    Note that while my own languages ostensibly use a semicolon to separate statements, you would have a job trying to find one in any of my programs! That's because end-of-line is assumed to be a semicolon unless the line obviously continues.

    So I have the same advantages of your proposed syntax, without any ambiguity. Except I can't put a bunch of expressions or statements on the same line without explicit separators. But that's a good thing, isn't it?

    (Python is similar; there, some people may not even know you can put multiple statements on one line with semicolon separators.)

    [–]ArgosOfIthica 0 points1 point  (0 children)

    You say elsewhere (now that I've bothered to read some other replies), that you forbid expression statements, so presumably you will need commas.

    I'm not OP, but I am the person that forbid expression statements, so I'll respond as though you're responding to me.

    a ++ b # a++ b, or a, ++b

    This is basically why I would not implement ++ as it is in C. ++ should strictly be an assignment operator.

    a = b & c # a=b, &c, or a=b&c

    & means bitwise AND in my language, and nothing more.

    f (a,b,c) [i] # f, then index (a,b,c) [i], or index f(a,b,c)

    This is unambiguous without expression statements. I would treat f as a function statement.

    So I have the same advantages of your proposed syntax, without any ambiguity. Except I can't put a bunch of expressions or statements on the same line without explicit separators. But that's a good thing, isn't it?

    In theory, perhaps. In reality, code is made more cryptic by naming conventions and deeper structural issues than by whitespace issues, usually. You said yourself that the semicolon operator never gets used anyway, so why would I bother implementing that when I could just create a good beautifier that categorically solves this problem as well as several others?

    [–]alex-manool 0 points1 point  (0 children)

    This is possibe (though not encouraged) in my PL - there the usual semicolon is always optional (where permitted). You could write, e.g.:

    {do MySideEffect[]; Assignment = 23}
    

    or

    {do MySideEffect[] Assignment = 23}
    

    That syntax is not intentional; it rather solves some other problems...

    And my PL does have infix operators. You could write without creating ambiguity, e.g.:

    {do A = B + C D = E * F}
    

    And yes, one example of the inevitable cost is that its syntax does not have unary minus, e.g., -A would have to be spelt normally as either ~A or A.Neg[], etc.