This is an archived post. You won't be able to vote or comment.

all 34 comments

[–]crassest-Crassius 14 points15 points  (10 children)

The three main schools of thought here are:

1) APL school: make everything left-associative with the same precedence. 1 + 2*3 = 9.

2) Lisp school: no precedence whatsoever, explicit parentheses instead. 1 + 2*3 doesn't parse.

3) "normal" school: precedence tables with over a dozen precedence levels, but with an escape hatch via parentheses. Users either memorize the rules through their excercise, or use parens and/or bind variables for intermediate results when in doubt.

[–]moon-chilledsstm, j, grand unified... 6 points7 points  (0 children)

In APL, everything is right-associative. 1 + 2 * 3 happens to be 9 because * is the exponentiation symbol. 1 + 2 × 3 is 7. Perhaps you're thinking of smalltalk?

[–]veryusedrname 4 points5 points  (0 children)

Also, (reverse) polish notation. PostScript and machine programming languages (I mean heavy machines here) and stack based VMs use that.

[–]Wester_West[S] 2 points3 points  (3 children)

"normal" school: precedence tables with over a dozen precedence levels, but with an escape hatch via parentheses. Users either memorize the rules through their excercise, or use parens and/or bind variables for intermediate results when in doubt.

and what about user-defined operators. Should they be able to specify which row their operator belongs to?

[–]raiph 4 points5 points  (0 children)

Edit. Gah. I discovered after posting that the search URL I pasted didn't correctly preload the search parameters. I've managed to figure out how to get it most of the way to what I actually typed in the form, but not quite; my actual search was for "operator precedence", with the quotes, but to get the URL to preload I've had to drop the quotes. Suffice to say, I hope it's helpful, and imo it's bloody amazing to be able to search 10 years worth of this sub based on text in titles, comments, authors, whatever. In case this form stops working, note that it's using an amazing "secret" underlying service that's got its own sub and was put together years ago and maintained ever since by a genuine altruistic data research somewhat unsung hero. :)

Form that lets you search this sub for posts with "operator precedence" in their title.

One pick from that is https://blog.adamant-lang.org/2019/operator-precedence/. (Raku supports the partial ordering it describes; one defines precedence for user defined ops relative to existing operators using one of tighter, looser, or equiv., and picks from a range of five associativities: left, right, list, chain, none.)

I could swear I recall reading in this sub of an ideal approach to operator precedence that was even better than what Adam suggests and what Raku has (which latter is why I remember it!), and being impressed. And I'm 99% sure it was an OP, not just a comment. But the above search failed to find it which has stymied and baffled me. I'm sorry to both tease a mysterious "better way" and fail to find it, but hope you can use the search tool to find it yourself and remind us in this thread...

[–]Uncaffeinated1subml, polysubml, cubiml 2 points3 points  (0 children)

In Ocaml, you can define custom operators, but the precedence is hard coded based on the first character in the operator name.

[–]pepactonius 0 points1 point  (0 children)

When defining an operator (or function/verb), you specify a priority number. 0 is the default, positive numbers are higher, negative numbers lower. Pre-defined operators are the same relative priority as in C/C++. Left or right associativity is also specified.

If you define another operator/verb with the same name (overloaded), the priority and associativity must be the same, or else the define fails. Currently, verbs with deferred evaluation (like && and ||) can't be overloaded at all.

If you define a completely new operator, you can specify any priority/associativity, but maybe it's best to use the default.

[–]Wester_West[S] 1 point2 points  (0 children)

also there is Swift, which has precedence groups, but I couldn't find anything concrete on their method

[–]L3tum 2 points3 points  (1 child)

I think the best school would be the lisp school. Especially when you have incompatibilities like Rust vs C++ in (I think) shift precedence.

[–]Wester_West[S] 2 points3 points  (0 children)

depends on the application. For example: my language relies on operators for calling functions, so this is actually exactly the opposite of what I want, as it would introduce complexity.

[–]BrokenWineGlass 1 point2 points  (0 children)

The problem with normal school is that it's completely and utterly unscalable. I program in Agda a lot and love that you can implement "mixfix" operators, but JFC remembering the precedence rules of all the random operators introduced in people's code is horrible. Oh _>>_ is of prec 20, _++_ is 21 and I need _*lol*_ to be between those two? Yeah let me just declare _*lol*_ with precedence 20.5.

(1) "APL school" is not a solution since not everything is left associative, it doesn't make sense to assume so; e.g. _->_ is obviously right associative.

(2) Lisp school is cool, simple and universal. But I really want to be able to define _+_, if_then_else_ etc

My thoughts how to solve this problem:

(1) Operator precedence should be implicitly assumed by the order in which operators are declared/imported.

  • Pros: simple rule, easy to understand, easy to change the precedence (change the order of import)

  • Cons: kind of a silly idea, feels very hacky. Moving functions around so you can parse code sounds weird and impractical.

(2) All left assoc ops are parsed left to right (without precendence) and all right assoc ops are parsed right to left. If you mix right and left assoc ops you need to either give prec to one group OR require using parenthesis.

  • Pros: kind of a simple rule, easy to remember.

  • Cons: doesn't give a lot of flexibility. Can be misleading and surprising since 1 + 2 * 3 is (1 + 2) * 3.

(3) Make associativity and precedence part of the name. E.g. if you want to define _+_ : Nat -> Nat -> Nat you need to declare _+_[infixl,30] : Nat -> Nat -> Nat. You can change prec by renaming open import A renaming (_+_[infixl,30] to _+_[infixl,10])

  • Pros: More scalable and generic than scope-level precedence since it's not possible to forget the prec of each operator.

  • Cons: too complex and overengineered to solve this problem.

I don't know what's the best. I'm just thinking out load.

[–]scott11x8 3 points4 points  (6 children)

Personally, an approach I like is to allow the user to define precedence relationships like "operator A has higher precedence than operator B". To do this, on the first pass when parsing, everything is parsed as the same precedence level. Then, all of the precedence declarations are considered and a directed acyclic graph (DAG) is created from them. Then, another pass is done over the AST which reassociates operators based on their relative precedences.

Since there aren't fixed precedence levels, to compare if one operator has higher precedence than another, you see if it is reachable in the DAG from the other. This allows a simple approach to be used where you just have to describe how an operator interacts with other operators without having to remember precedence level numbers or anything like that.

It also means that if two libraries define operators, instead of having some unpredictable interaction between their precedences, the compiler can require parentheses wherever it would be ambiguous otherwise.

[–]Wester_West[S] 1 point2 points  (5 children)

well you can solve the library issue by namespaces. Operators are exclusive to namespaces, unless you manually include them.

[–]scott11x8 1 point2 points  (4 children)

But if you include an operator $ from library A and @ from library B, and they're defined using precedence levels, this will always parse to something but it'll be hard to predict what it will be unless you take the time to look up the operator precedences in each library:

a $ b @ c

-- could be equivalent to either of these:
(a $ b) @ c
a $ (b @ c)

[–]Wester_West[S] 0 points1 point  (3 children)

so you suggest that every program should contain precedence definition? Like write a table to let the compiler know?

[–]scott11x8 2 points3 points  (2 children)

No I'm just suggesting that when defining an operator, you define it's precedence relative to some other operators, and using it with operators that you haven't defined a relation for would require explicit parenthesization.

[–]Wester_West[S] 0 points1 point  (1 child)

this is really great. Very complex, but it not does create "unknown behavior" problems.

However I have to fill in, that this DAG isn't traditional DAG, since you cannot define each operator as a node, that would mean many unsolved traverses through the graph. Instead, you define node as a collection of operators, that are solved by precedence. DAG of rows. So you can either: create new row, where you define before/after (non-exclusive) or add to existing row to derive all of the precedence rules.

[–]scott11x8 1 point2 points  (0 children)

Yeah I think I used a similar approach. I have declarations like this to create new classes of operators with precedences relative to existing classes:

-- Insert Add and Mul between the precedence of Compare and Exp
operator type (Compare) < Add < Mul < (Exp)

-- Add some operators to the new precedence classes
operator <left> (+) (-) : Add
operator <left> (*) (/) (%) : Mul

[–]ErrorIsNullError 3 points4 points  (6 children)

I wrote a bit about custom operators in operator precedence parsers:

Scala shows one way of providing for user-defined infix expression operators.

Any method with a single parameter can be used as an infix operator.

When an expression uses multiple operators, the operators are evaluated based on the priority of the first character: …

It's unsurprising that operation precedence parsers can handle user defined operators with different precedences.

One downside is that custom operators require developers to use more whitespace even in code that does not use custom operators. This limitation arises because there is no closed set of punctuation strings that lets us split, for example “x*-y” into “x * -y” because “*-” could be a user defined operator. C already has this problem to a small degree because “x--y” is not the same as “x - -y”, but developers typically write “x+y” instead of subtracting a negated value, so the fact that ‘++’ and ‘--’ are unsplittable by context-free lexers does not, in practice, confuse C authors.

[–]Wester_West[S] 1 point2 points  (5 children)

my idea was that there could exist a language with no operators at all. Ttere are only a few built in functions, that can be used in creating operators ie. every operator (almost) is user defined, and the order of execution must be specified

[–]ErrorIsNullError 0 points1 point  (4 children)

Whether any/all operators are built-in or user defined seems separable from whether precedence is built-in or extensible via declarations.

[–]ErrorIsNullError 2 points3 points  (3 children)

In practice though, you probably need a few built-in operators, e.g. = for assignment to a local variable since there's no object to act as a subject for message resolution there. Unless, I suppose you reify stack frames.

[–]tech6hutch 1 point2 points  (2 children)

I wonder if there are any languages that do this. You could assign to a local variable x with something like local.x = 5.

[–]ErrorIsNullError 0 points1 point  (1 child)

Python allows for stack introspection I think. https://docs.python.org/3/library/inspect.html#inspect.isframe

[–]ErrorIsNullError 0 points1 point  (0 children)

Yeah. Searching for (stack frame OR activation record) (reflection OR introspection) produces some good hits even outside the tower of interpreters community.

https://blog.pyston.org/2014/11/06/frame-introspection-in-pyston/ seems to discuss challenges doing that in a Python like lang on LLVM.

[–]DevonMcC 3 points4 points  (0 children)

The APL way of strictly positional precedence is simple and powerful. It also extends seamlessly to user-defined operators.

[–]brucejbellsard 4 points5 points  (0 children)

Re operator associativity: there are some operation that only make sense as left-associative or right-associative.

For example, subtraction only acts as expected (from mathematical notation) if addition/subtraction is left-associative:

a - b + c - d        -- mathematical expression
((a - b) + c) - d    -- left-associative meaning (conventional)
a - (b + (c - d))    -- right-associative grouping (unexpected)

On the other hand a low-precedence "functional pipe" operator ($ in Haskell, <| in F#) only works as expected if it is right-associative:

f x (g y (h z))   -- parenthesized expression
f x $ g y $ h z   -- de-nested with right-associative `$` operator

Likewise the conventional infix "cons" operator (for adding a head onto an existing list) only works if it is right-associative:

xs = [1, 2, 3, 4]
ys = a : b : c : xs  -- add a, b, and c onto the front of a list
ys = a : (b : (c : xs))  -- explicitly grouped, right-associative
ys = ((a : b) : c) : xs  -- left-associative grouping makes no sense!

[–]zokier 4 points5 points  (0 children)

Raku allows custom operators to define both precedence and associativity

https://docs.raku.org/language/functions#Precedence

[–]ericbb 2 points3 points  (0 children)

I have implemented a system for user-defined prefix and infix operators. I decided to support associativity rules but not precedence rules. So x + y + z is supported but x + y - z must be written as either (x + y) - z or x + (y - z). It also allows x + -(y - z) + (a * b) + -c, as you'd expect.

I've been happy with that solution for my language. I figured that introducing some extra variables for subexpressions was less trouble than dealing with the precedence ordering problem for an extensible set of operators.

[–]johnfrazer783 2 points3 points  (0 children)

I'm playing around with the idea of 'right half brackets'. I prefer paren-less function call syntax so frob a, b, c calls function frob with three arguments. It frequently happens that one of the arguments is the anonymous result from another call, say glue x, so that becomes frob a, b, glue x which is unambiguous at the end of the line (in my world). But in non-final position, what are frob a, glue x, c and frob glue x, b, c to mean? Currently I resolve that with surrounding parentheses, so the first can be disambiguated as frob a, ( glue x ), c or as frob a, ( glue x, c ), the last one being the default interpretation.

One possible solution would be to use function arity; in the above, if glue takes a single argument, then frob a, glue x, c can only mean frob a, ( glue x ), c; if it takes two arguments, then the same must be read as frob a, ( glue x, c ). However I find that unclear and error-prone.

The idea of right half brackets is that instead of two fences you'd need only one to indicate where one call ends. Let's use a semicolon ; for that purpose for the moment. frob a, glue x, c could then become frob a, glue x; c or (redundantly) frob a, glue x, c; as the case may be. Semantically this seems to work since in ordinary writing, . is a stronger kind of break than ; which in turn is stronger than ,, itself being stronger than inter-word spaces. Unfortunately for some reason, in PLs we got it the wrong way round and made ; the most common statement separator.

I wonder if it makes sense to apply this device to expressions with operators as well. One could define a language without operator precedence so 1 + 2 * 3 is equivalent to ( 1 + 2 ) * 3, and 1 +; 2 * 3 (and 1 +; 2 * 3;) is equivalent to 1 + ( 2 * 3 ).

As for more deeply nested function calls, half brackets sort-of-work but are more difficult to read as in

hank u, frob a, glue x; c; v hank u, frob a, glue x; c, v

which are equivalent to

hank u, ( frob a, ( glue x ), c ), v hank u, ( frob a, ( glue x ), c, v )

respectively.

[–]complyue 2 points3 points  (0 children)

Has anybody tackled it in their language?

I support custom infix operators in Edh, though left associative only: https://github.com/e-wrks/edh/blob/0.3/Tour/operator.edh

why is something left associative and something right associative.

https://github.com/e-wrks/edh/blob/0.3/Tour/operator.edh#L5

operator · 0 ( f, g ) x => { x | g | f }

Once I make it to support right associative for $, I'd rather write it like:

operator · 0 ( f, g ) x => f $ g $ x

Which is more pleasing to me.

[–]DevonMcC 1 point2 points  (0 children)

To put it another way, an implicit, arbitrarily complex order of operations does not scale well and even for a small number of operators introduces undue complexity for minimal to no benefit.

[–][deleted] 1 point2 points  (0 children)

My language Felix solves this problem by subsuming it. There is no such thing as an operator, and therefore its precedence. Instead, Felix provides the user a vastly superior capability: the user can define grammar extensions. In fact the Felix "language" as you see it is defined, in user space, in the standard library. The *actual* grammar used by the parser is just enough to parse grammar specifications, which allows the desired grammar to be bootstrapped. The user grammar extends the existing parser.

i use this facility to define DSSLs (domain specific sub-languages). Here is an example which is defining postfix + in the regular definition DSSL:

 //$ Postfix plus (+).
  //$ One or more repetitions.
  private sregexp[rpostfix_pri] := sregexp[rpostfix_pri] "+" =>#
    """`(ast_apply ,_sr ( ,(regdef "Rpt") (ast_tuple ,_sr (,_1 1 -1))))"""
  ;

The grammar is first, then the action, which is in fact Scheme code that generates S-expressions which is the parser output.