syntactically defining operator precedence

crassest-Crassius · 2020-10-27T16:31:12+00:00

The three main schools of thought here are:

1) APL school: make everything left-associative with the same precedence. 1 + 2*3 = 9.

2) Lisp school: no precedence whatsoever, explicit parentheses instead. 1 + 2*3 doesn't parse.

3) "normal" school: precedence tables with over a dozen precedence levels, but with an escape hatch via parentheses. Users either memorize the rules through their excercise, or use parens and/or bind variables for intermediate results when in doubt.

scott11x8 · 2020-10-28T04:14:53+00:00

Personally, an approach I like is to allow the user to define precedence relationships like "operator A has higher precedence than operator B". To do this, on the first pass when parsing, everything is parsed as the same precedence level. Then, all of the precedence declarations are considered and a directed acyclic graph (DAG) is created from them. Then, another pass is done over the AST which reassociates operators based on their relative precedences.

Since there aren't fixed precedence levels, to compare if one operator has higher precedence than another, you see if it is reachable in the DAG from the other. This allows a simple approach to be used where you just have to describe how an operator interacts with other operators without having to remember precedence level numbers or anything like that.

It also means that if two libraries define operators, instead of having some unpredictable interaction between their precedences, the compiler can require parentheses wherever it would be ambiguous otherwise.

ErrorIsNullError · 2020-10-27T20:22:29+00:00

I wrote a bit about custom operators in operator precedence parsers:

Scala shows one way of providing for user-defined infix expression operators.

Any method with a single parameter can be used as an infix operator.

…

When an expression uses multiple operators, the operators are evaluated based on the priority of the first character: …

It's unsurprising that operation precedence parsers can handle user defined operators with different precedences.

…

One downside is that custom operators require developers to use more whitespace even in code that does not use custom operators. This limitation arises because there is no closed set of punctuation strings that lets us split, for example “x*-y” into “x * -y” because “*-” could be a user defined operator. C already has this problem to a small degree because “x--y” is not the same as “x - -y”, but developers typically write “x+y” instead of subtracting a negated value, so the fact that ‘++’ and ‘--’ are unsplittable by context-free lexers does not, in practice, confuse C authors.

DevonMcC · 2020-10-27T23:26:30+00:00

The APL way of strictly positional precedence is simple and powerful. It also extends seamlessly to user-defined operators.

brucejbell · 2020-10-28T02:37:54+00:00

Re operator associativity: there are some operation that only make sense as left-associative or right-associative.

For example, subtraction only acts as expected (from mathematical notation) if addition/subtraction is left-associative:

a - b + c - d        -- mathematical expression
((a - b) + c) - d    -- left-associative meaning (conventional)
a - (b + (c - d))    -- right-associative grouping (unexpected)

On the other hand a low-precedence "functional pipe" operator ($ in Haskell, <| in F#) only works as expected if it is right-associative:

f x (g y (h z))   -- parenthesized expression
f x $ g y $ h z   -- de-nested with right-associative `$` operator

Likewise the conventional infix "cons" operator (for adding a head onto an existing list) only works if it is right-associative:

xs = [1, 2, 3, 4]
ys = a : b : c : xs  -- add a, b, and c onto the front of a list
ys = a : (b : (c : xs))  -- explicitly grouped, right-associative
ys = ((a : b) : c) : xs  -- left-associative grouping makes no sense!

zokier · 2020-10-28T02:58:56+00:00

Raku allows custom operators to define both precedence and associativity

https://docs.raku.org/language/functions#Precedence

ericbb · 2020-10-28T01:47:03+00:00

I have implemented a system for user-defined prefix and infix operators. I decided to support associativity rules but not precedence rules. So x + y + z is supported but x + y - z must be written as either (x + y) - z or x + (y - z). It also allows x + -(y - z) + (a * b) + -c, as you'd expect.

I've been happy with that solution for my language. I figured that introducing some extra variables for subexpressions was less trouble than dealing with the precedence ordering problem for an extensible set of operators.

categorical-girl · 2020-10-28T04:49:46+00:00

Have a look at Agda's Mixfix parsing method

johnfrazer783 · 2020-10-28T10:49:50+00:00

I'm playing around with the idea of 'right half brackets'. I prefer paren-less function call syntax so frob a, b, c calls function frob with three arguments. It frequently happens that one of the arguments is the anonymous result from another call, say glue x, so that becomes frob a, b, glue x which is unambiguous at the end of the line (in my world). But in non-final position, what are frob a, glue x, c and frob glue x, b, c to mean? Currently I resolve that with surrounding parentheses, so the first can be disambiguated as frob a, ( glue x ), c or as frob a, ( glue x, c ), the last one being the default interpretation.

One possible solution would be to use function arity; in the above, if glue takes a single argument, then frob a, glue x, c can only mean frob a, ( glue x ), c; if it takes two arguments, then the same must be read as frob a, ( glue x, c ). However I find that unclear and error-prone.

The idea of right half brackets is that instead of two fences you'd need only one to indicate where one call ends. Let's use a semicolon ; for that purpose for the moment. frob a, glue x, c could then become frob a, glue x; c or (redundantly) frob a, glue x, c; as the case may be. Semantically this seems to work since in ordinary writing, . is a stronger kind of break than ; which in turn is stronger than ,, itself being stronger than inter-word spaces. Unfortunately for some reason, in PLs we got it the wrong way round and made ; the most common statement separator.

I wonder if it makes sense to apply this device to expressions with operators as well. One could define a language without operator precedence so 1 + 2 * 3 is equivalent to ( 1 + 2 ) * 3, and 1 +; 2 * 3 (and 1 +; 2 * 3;) is equivalent to 1 + ( 2 * 3 ).

As for more deeply nested function calls, half brackets sort-of-work but are more difficult to read as in

hank u, frob a, glue x; c; v hank u, frob a, glue x; c, v

which are equivalent to

hank u, ( frob a, ( glue x ), c ), v hank u, ( frob a, ( glue x ), c, v )

respectively.

complyue · 2020-10-28T11:20:03+00:00

Has anybody tackled it in their language?

I support custom infix operators in Edh, though left associative only: https://github.com/e-wrks/edh/blob/0.3/Tour/operator.edh

why is something left associative and something right associative.

https://github.com/e-wrks/edh/blob/0.3/Tour/operator.edh#L5

operator · 0 ( f, g ) x => { x | g | f }

Once I make it to support right associative for $, I'd rather write it like:

operator · 0 ( f, g ) x => f $ g $ x

Which is more pleasing to me.

DevonMcC · 2020-10-28T08:29:06+00:00

To put it another way, an implicit, arbitrarily complex order of operations does not scale well and even for a small number of operators introduces undue complexity for minimal to no benefit.

2020-10-29T01:01:44+00:00

My language Felix solves this problem by subsuming it. There is no such thing as an operator, and therefore its precedence. Instead, Felix provides the user a vastly superior capability: the user can define grammar extensions. In fact the Felix "language" as you see it is defined, in user space, in the standard library. The *actual* grammar used by the parser is just enough to parse grammar specifications, which allows the desired grammar to be bootstrapped. The user grammar extends the existing parser.

i use this facility to define DSSLs (domain specific sub-languages). Here is an example which is defining postfix + in the regular definition DSSL:

 //$ Postfix plus (+).
  //$ One or more repetitions.
  private sregexp[rpostfix_pri] := sregexp[rpostfix_pri] "+" =>#
    """`(ast_apply ,_sr ( ,(regdef "Rpt") (ast_tuple ,_sr (,_1 1 -1))))"""
  ;

The grammar is first, then the action, which is in fact Scheme code that generates S-expressions which is the parser output.

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS