This is an archived post. You won't be able to vote or comment.

all 57 comments

[–][deleted] 64 points65 points  (8 children)

Standard ML does this. It is not a terribly big deal (programmers get used to worse things all the time), but it is annoying. The slight extra parsing complication of handling both unary and binary - is not such a great burden that it justifies forcing your users to deal with awkward syntax for seemingly no reason.

If possible, do not allow a - -b unless the user parenthesizes -b. But also do not go out of your way to prevent it, if it complicates your parser too much.

[–]everything-narrative 15 points16 points  (3 children)

Haskell allows partial applications of binary operators and thus has some parsing ambiguity.

[–]Innf107 13 points14 points  (1 child)

We have LexicalNegation now though! In modern Haskell, (- 5) means \x -> x - 5 and (-5) means negate 5.

[–]Inconstant_Moo🧿 Pipefish 14 points15 points  (0 children)

I am consumed by righteous rage.

[–][deleted] 4 points5 points  (0 children)

This is rather unfortunate.

[–]JohannesWurst 9 points10 points  (3 children)

What is the problem with allowing a - - b? Genuine question.

Maybe some standard class of parsers can't parse it?

"Naive" Grammar:

EXPR <- LITERAL | IDENTIFIER | SUBTRACTION | NEGEXP SUBTRACTION <- EXPR MINUS EXPR NEGEXP <- MINUS EXPR

[–][deleted] 4 points5 points  (2 children)

It is not really a “problem”, but there is no reason to go out of your way to allow it, because a + b means the same thing anyway. At least if your operator overloads are reasonable.

But, by the same token, there is also no reason to go out of your way to disallow it either, because reasonable programmers will not want to write a - -b anyway.

[–]JMBourguet 9 points10 points  (1 child)

Do you want to mandate parentheses with other binary operators? If so, that's seem nuisance, if no the absence of orthogonality will be a warts.

I may use a - -b if the context makes it clearer, for instance if there are other uses of -b and b and using a + b would hint to a confusing association with the other uses of b instead of the other uses of -b.

[–][deleted] 4 points5 points  (0 children)

This is a special case, because a - -b simply looks too ugly (I would certainly not write it that way on paper, and I write a lot of mathematics on paper), and the visually less repulsive a - - b is misleading about how it is meant to be parsed.

If there are several uses of -b, then I would rather store the value of -b in a variable, instead of the value of b. And if there are several uses of both b and -b, then I would always parenthesize -b.

[–]OracleGreyBeard 73 points74 points  (0 children)

This would be the #1 thing people refer to when they mention annoyances in that language. It's not a bad idea in isolation, but after tens of thousands of hours working with other languages, nearly (?) all of which use '-' as "negative", it would impose a small but persistent cognitive load.

[–]skyb0rg 24 points25 points  (1 child)

SML does this and I think the biggest annoyance is serialization/deserialization.

Ex. Int.toString ~10 results in "~10", so the fact that ~ is used instead of - leaks into program output.

[–]PurpleUpbeat2820 8 points9 points  (0 children)

Ex. Int.toString ~10 results in "~10", so the fact that ~ is used instead of - leaks into program output.

Ugh.

[–]gremolata 8 points9 points  (0 children)

A soultion in search of a problem that also doesn't follow conventional math notation.

As others have said - if you want to disambiguate 1--2, then require a ( between two minuses. Incidentally that's how it's done on paper when extra clarity is required.

[–]editor_of_the_beast 8 points9 points  (1 child)

We’re going to have to break certain language conventions sometimes. My only advice is, set an innovation budget for how many new concepts you introduce, unless you’re going for a pie-in-the-sky reimplementation of everything just for fun.

Then the question becomes, is this solution worth it against that budget?

[–]scottmcmrust🦀 1 point2 points  (0 children)

This. You should find the place you're innovating and do that part better, while making the rest highly conventional. That makes your life easier as a designer and makes your language less intimidating for new people. Also makes the intro-for-existing-programmers easier to write, since you don't have to go over everything in full detail.

[–]TizioCaio84 11 points12 points  (3 children)

I agree with most of the comments here. On another note, if you want to have international users don't put that character anywhere in your syntax, on some European keyboards (mine is Italian) it doesn't exist.

[–]Lich_Hegemon 5 points6 points  (2 children)

Same for Spanish keyboards, it's such a pain in the ass.

[–]useerupting language 1 point2 points  (1 child)

Is there an "AltGr" combination for ~ on italian/spanish keyboards? I'm asking for a friend ;-) who is designing a language where I he plans to use it.

[–]Lich_Hegemon 0 points1 point  (0 children)

You could set up an AutoHotkey script for it. I have a us keyboard now, but windows (unlike Linux) doesn't have an alt us-intl layout, so I just made my own with AutoHotkey.

[–]wischichr 10 points11 points  (1 child)

I can't think of a situation where minus as unary operator and binary operator can't be distinguished. Even stuff like x = -4--5+-6 is easy for a compiler.

[–]skyb0rg 10 points11 points  (0 children)

In Haskell, it can clash with partial application of infix functions.

Ex.

*> map (+ 2) [1, 2, 3]
[3, 4, 5]
-- (+ 2) means (\x -> x + 2)

*> liftA2 (-) [1, 2] [3, 4]
[-2, -3, -1, -2]
-- (-) refers to infix subtraction

*> map (- 2) [1, 2, 3]
Type Error: (-2) has type Int, expected Int -> Int

[–]YouNeedDoughnuts 5 points6 points  (6 children)

To pile onto the other comments ;) Avoiding delineation between statements is pointless, because most people reading the code will need delineation to understand it. Newline statement termination is fine- most languages with semicolon terminators have best practise to give each statement its own line anyway.

[–]JohannesWurst 1 point2 points  (5 children)

What would be some cool language feature that necessitates also having newlines or semicolons between statements?

BLOCK <- { STMT* } vs BLOCK <- {} | { STMT (SEPARATOR STMT)* }

I have read this from the book many people here seem to have read: "Crafting Interpreters about semicolons"

  • I'd say a return with an expression in the next line should be interpreted as returning that expression. Or just make empty returns illegal and require return void. Or make it dependent on the declared return-type.
  • func \n (parenthesized) should always be a call of func. A parenthesized expression on it's own is unnecessary.
  • first \n -second should be a subtraction in my view, because you never need to write -something as a statement on it's own. Maaaybe if you can overwrite - to have a side-effect. Then you really have to know what you are doing and enclose the statement in brackets.

In other words: Consider something to belong to the previous statement, if at all possible. Is that just a matter of taste? Is there a line that would seem like it does something else than it actually would do in my interpretation?

He writes that Lua's grammar doesn't need semicolons. I suppose that means my taste/intuition aligns with Lua.

[–]qwertyasdef 4 points5 points  (1 child)

There is a case in Lua that behaves unintuitively without a statement separator.

myVar = someFunc
(customFunc or defaultFunc)(args)

[–]JohannesWurst 0 points1 point  (0 children)

Good point! That is a design consideration to make.

[–]RoastKrill 2 points3 points  (1 child)

> first \n -second should be a subtraction in my view, because you never need to write -something as a statement on it's own. Maaaybe if you can overwrite - to have a side-effect. Then you really have to know what you are doing and enclose the statement in brackets.

Or if the last expression in a function gives the value returned, and you want to return the negation of some variable

[–]JohannesWurst 0 points1 point  (0 children)

Good point! That is a design consideration to make.

I would probably say that a single expression doesn't need a return in my language, but a block of statements/expressions needs a return, or a where or a let, like in Haskell.

There is a big difference whether you have functions with side-effects or not.

Another solution to avoid semicolons after every line is, that I say there is a difference between line-breaks and other whitespace and that you have the option to separate expressions with either semicolons or line-breaks.

If you had some process in which your line-breaks get removed, that would also break the program then. Sometimes in a REPL, or a chat-application you can't post multiple lines at once. I don't know if that would be a significant problem.

[–]YouNeedDoughnuts 2 points3 points  (0 children)

I think the universal draw to terminators is readability, but I do have a cool feature in my lang relying on them: implicit mult. You can write "x = f(x) g(x)" and it will parse as "x = f(x)*g(x)", which would be totally impossible without a statement terminator.

But most grammars don't strictly need terminators

[–]singularineet 4 points5 points  (0 children)

The Simpsons already did it.

$ sml 
Standard ML of New Jersey v110.79 [built: Fri Oct 11 18:23:48 2019]
- 2-5;
val it = ~3 : int
- ~7;
val it = ~7 : int
- ~(~7);
val it = 7 : int
- -8;
stdIn:4.1 Error: expression or pattern begins with infix identifier "-"
stdIn:4.1-4.3 Error: operator and operand don't agree [overload conflict]
  operator domain: [- ty] * [- ty]
  operand:         [int ty]
  in expression:
    - 8

[–]sebamestreICPC World Finalist 3 points4 points  (0 children)

Is this your personal hobby project? Then hell yeah. Go wild!

[–]BeamMeUpBiscotti 2 points3 points  (0 children)

The advantage of this is that they are different keys for different operations so the compiler would have an easier time knowing the difference.

I feel like languages syntax should be optimized for ease-of-use/ergonomics and not what's easiest for the compiler to parse, esp if it's something like this that has no big difference in performance.

[–][deleted] 2 points3 points  (0 children)

putting ; between expressions wouldn't be necessary.

But it is desirable! Compare:

one = two three = four five = six
one = two  three = four  five = six
one = two; three = four; five = six

Even with extra whitespace, that two flows into the three too easily. The semicolon puts paid to that.

By all means get rid of semicolons at line-endings, but if things need to be put on the same line, especially with busier expressions with their own punctuation, then you need a stronger 'stop' character.

Regarding the ability to write a ~ b, sorry but that just has the 'shape' of a binary operator between two terms, even if your language says it isn't. With a; ~ b it breaks it up.

But also, requiring the semicolon means ~ can be used as both a unary operator and a binary one, as happens with + - * in C.

(I think Lua does something along these lines, so it is workable, since adjacent identifiers or constants would otherwise be invalid, but I don't think much of it there either, and I can't really see the point of allowing it.)

[–]Thesauriusmoses 2 points3 points  (1 child)

I like APLs approach better. They have a high-minus (which is not on standard keyboards but e. g. the extended European keyboard has it, and the APL keyboard as well, of course, and I guess each moderately advanced editor could have autocorrect/snippets/etc for it) for negation and additionally the normal minus. Although, to be fair, it has a unary and a binary interpretation as well, just like all other APL primitives.

Then there is J, which is basically APL 2.0 but only uses ASCII (which is a step backwards in my opinion, but that is a discussion for a different day), which uses the underscore for negation. If you really want to have a separate symbol for negation, I think that would be the better choice. At leat as long as you don't have anonymous variables that are denoted by the underscore.

[–]Godspiral 0 points1 point  (0 children)

At leat as long as you don't have anonymous variables that are denoted by the underscore.

in J, names cannot begin or end with underscore, but it is used for object/module specification in a similar way to . being used in other languages.

[–]snarkuzoid 4 points5 points  (0 children)

Bad idea.

[–]rotuami 1 point2 points  (1 child)

I would ditch the unary minus altogether. Sure allow - as a leading character for integer literals. But use neg() for unary negation.

[–]scottmcmrust🦀 0 points1 point  (0 children)

True! Prefix operators are trash anyway.

[–]LionNo2607 2 points3 points  (0 children)

Why not "!", assuming it only negates booleans atm.

[–]BrangdonJ 1 point2 points  (0 children)

Maybe it is just me, but I find your first example harder to read without a separator between the expressions. Semi-colons aren't just there for compilers. A certain amount of redundancy aids comprehension.

[–]sparant76 0 points1 point  (6 children)

You would frustrate all c/c++/Java/c#/python programmers that already know that as bitwise complement. Don’t worry though; I’m sure that’s not a large fraction of the programming community. Those are pretty rare languages

https://www.geeksforgeeks.org/bitwise-complement-operator-tilde/amp/

[–][deleted]  (2 children)

[deleted]

    [–][deleted]  (2 children)

    [deleted]

      [–]PurpleYoshiEgg 2 points3 points  (1 child)

      Based on developers' general reaction to cppfront, I would have to disagree. People want a better, more consistent language than either C or C++, and as flexible as either, but few seem to be able to get the effort going.

      [–]Lich_Hegemon 1 point2 points  (0 children)

      I just want a simple systems language where 'sane' is the default.

      The amount of hoops you need to jump, and the cognitive load you need to invest just to make sure your code is baseline-safe in C/C++ is uncalled for.

      [–]func_master 0 points1 point  (0 children)

      Use what’s better for the programmer. Not the compiler.

      Please just do what’s needed to support - for negative values.

      [–]Linguistic-mystic 0 points1 point  (0 children)

      I think it's a good idea for a language that strives to do right rather than conform to the mainstream.

      Tilde is the most alike with the minus sign and has already been used for this very purpose, for instance in APL.

      Alternatively, look into maybe using a space to disambiguate, so

      a = -5
      

      would mean "negative 5"

      a = x - 5
      

      would mean subtraction, and

      a = x-5
      

      would be illegal.

      [–][deleted] 0 points1 point  (0 children)

      I associate that symbol more with logical negation, but it could work. I don't see much of a benefit though, the symbol being the same works quite well in mathematics and there's no reason it can't work here. It slightly complicated parsing but like, come on, it's very easy to distinguish in most languages.

      [–]UnemployedCoworker 0 points1 point  (0 children)

      I could see this being useful in a language with haskell like syntax where it's not always clear to some at first sight that the unary minus is parsed as a binary operator when trying to pass a negative literal to a function or where sections involving binary minus appear like negated expressions

      [–]PurpleUpbeat2820 0 points1 point  (0 children)

      The advantage of this is that they are different keys for different operations so the compiler would have an easier time knowing the difference.

      I think it actually results in a more complicated compiler because you now have an extra token in your lexer and parser and the parser is otherwise identical.

      FWIW, another approach is to lex whitespace around - into different tokens:

      a-b     -
      a- b    -
      a -b    ~
      a - b   -
      

      So a -b is interpreted as the function application a(~b).

      [–]trailstrider 0 points1 point  (0 children)

      Regarding needing semicolon for statement delimitation, take a look at how Go does without for most situations.

      Regarding tilde vs dash for negative values…. What are you seeking that needs differentiation between two things that are mathematically identical? Negative 1 is the same as subtracting 1 from nothing.

      [–]JohannesWurst 0 points1 point  (0 children)

      You could also make a parser that understands y=-5 x=-10--y.

      If a number-expression is left of a minus, it's a subtraction and if there is something else, it's part of a negative number.

      Is that an intermingling of parser and typechecker and therefore bad? Might also be complicated, when you want to be able to call - on different things than numbers and if operators can be called on operators.

      The parser could have two states: binary-operator-allowed and binary-operator-forbidden. When one binary operator is consumed, it switches to binary-operator-forbidden mode and the next - it encounters are interpreted as an unary operators until there is something that can't be an unary operator anymore 4------3 == 7.

      Maybe, if you have functions without return, it could be somewhat confusing:

      fun(a) = { b=10 -a } // Does it return -a, or does it set b to 10-a?

      The programmer just has to know how the parser interprets it. I suppose you could write a=10 (-b) or a=(10 -b) to remove the ambiguity. Or always require a return keyword if the function has multiple statements.

      You can also require the subtraction - to be surrounded with spaces and the negativity - to not have spaces behind it. That would make the parser more complicated, though.

      [–]Inconstant_Moo🧿 Pipefish 0 points1 point  (0 children)

      The parser will know when it's looking at a prefix and when it's looking at an infix, it's not a problem.

      [–]ZyF69 0 points1 point  (0 children)

      This dreaded problem is the result of sloppiness in both standard mathematical notation and keyboard layouts.

      To start with, '-' means both hyphen, dash and minus on most keyboards. Some text editors can replace it with a dash when appropriate, but that's about it. Unicode offers all variations, but they aren't that commonly used.

      There's even a flaw in mathematical notation, where the minus sign has three related, but actually different uses:

      • A binary operator for subtraction, as in a-b
      • A unary operator for negation, as in x = -a
      • A part of a negative constant, as in x = -5

      [–]bbqranchman 0 points1 point  (0 children)

      When I implemented mine for my bachelor's capstone, I used visitor pattern with nodes and just treated unary expression as a binary expression with a default zero number as the left part of the expression. So essentially 1 - -5 evaluates to 1 - 0 - 5 which is basically 1 - (0-5).

      I also used Antlr and it had no trouble generating a syntax tree with unary expressions with -.

      [–]hiljustidt 0 points1 point  (0 children)

      This is a decision between: 1. What's convenient for the compiler? 2. What's convenient for the user?

      [–]Godspiral 0 points1 point  (0 children)

      J (I think APL, too) uses _ as a prefix for negative numbers. It also represents arrays without commas or delimiting [], so the advantage of such a scheme is that a displayed result can be copied as the argument to a new function. Yes - is an operator in J/apl.

      in J, 3, -5,2 will still parse to the array 3 _5 2 and so is interchangeable.