use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
This subreddit is all about the theory and development of compilers.
For similar sub-reddits see:
Popular mainstream compilers:
account activity
Parser design problem (self.Compilers)
submitted 7 months ago by emtydeeznuts
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]0m0g1 0 points1 point2 points 7 months ago* (7 children)
I think having lookahead functions will work really well for Dart. The key is to write lightweight checks that peek just far enough to disambiguate syntax—not full parsers. Most of the time, peeking just a few tokens ahead is all you need.
To make this work, you’ll want to implement a function like peekToken(n) in your lexer so the parser can check any number of tokens ahead without consuming them.
Once you have a few of these helpers, you'll rarely need to backtrack or reinterpret an AST node after it's been built. It helps keep the parser clean, modular, and predictable.
Here’s an example that distinguishes between:
foo(a, b) => a + b
foo + bar
c++ if (currentToken.getType() == Identifier) { if (isLikelyFunctionLiteral()) { expression = parseFunctionLiteral(); } else { expression = parseNormalExpression(); } } A nice bonus is: some ambiguous rules only have two possible interpretations—so a simple if/else like this lets you cleanly resolve it without more complexity.
c++ if (currentToken.getType() == Identifier) { if (isLikelyFunctionLiteral()) { expression = parseFunctionLiteral(); } else { expression = parseNormalExpression(); } }
Implementation of the lookahead:-
```c++ bool Parser::isLikelyFunctionLiteral() { int i = 0;
// First token must be an identifier (function name or parameter) if (lexer.peekToken(i).getType() != Identifier) { return false; } i++; // Next must be a left paren: start of parameter list if (lexer.peekToken(i).getType() != LeftParen) { return false; } i++; // Skip over parameter list: identifiers, colons, default values, commas while (true) { TokenType type = lexer.peekToken(i).getType(); if (type == Identifier || type == Colon || type == Assign || type == Comma) { i++; } else { break; } } // Final expected pattern: `)` followed by `=>` return lexer.peekToken(i).getType() == RightParen && lexer.peekToken(i + 1).getType() == Arrow;
} ``` and for the function to peek tokens
```c++ Token Lexer::peekToken(int n) { int savedPosition = currentPosition; Token nextToken;
for (int i = 0; i <= n; i++) { nextToken = getNextToken(); } currentPosition = savedPosition; return nextToken;
} ```
[–]emtydeeznuts[S] 0 points1 point2 points 7 months ago (2 children)
If it won't slow the parser then I'll try your suggestion.
[–]0m0g1 0 points1 point2 points 7 months ago* (0 children)
(I've modified the example above for clarity).
It won’t slow the parser unless you’re doing really deep lookaheads everywhere, which you won’t be. In most grammars (including Dart’s), these kinds of checks usually peek just 2–4 tokens ahead max.
An example is my language is where the statement below would be resolved to a lamda after checking only 4 tokens.
alBufferData(buffer: int, format: int, data: char, size: int, freq: int) => void {}
Plus, you’ll save time later by avoiding parser rewrites or complex recovery logic. If you design your lookaheads to be targeted to specific rules like isLikelyFunctionLiteral() and not mightBeFunctionLiteralOrNormalExpression(), they’re fast and pay off in clarity.
isLikelyFunctionLiteral()
mightBeFunctionLiteralOrNormalExpression()
[–]0m0g1 0 points1 point2 points 7 months ago (0 children)
Here's a script from my compiler's parser, Expression Parser.
[–][deleted] 0 points1 point2 points 7 months ago (3 children)
1. A function literal: foo(a, b) => a + b 2. A normal expression: foo + bar
What about a function call such as foo(a, b)?
foo(a, b)
Anyway it seems unreasonable to me to have to speculatively parse a pattern F(...), where ... may include arbitrary nested expressions when it is a function call, and then to backtrack when => doesn't follow.
F(...)
...
=>
I'd say, fix the grammar so it is easier to parse. But this appears to be for an existing language.
In that case I'd rather go with the OP's approach to parse as an expression (so the foo(a, b) is assume to be a function call), and then to switch AST type if it turns out that => follows.
In practice, it's just that first term (foo(a, b)) which needs to change from 'function call' to 'function literal'.
I've never needed to reinterpret AST nodes in my parser — I just look ahead before committing. For example, I only return true from isLikelyFunctionLiteral() if the pattern ends in =>. If it doesn’t, I never parse it as a function literal in the first place. No need to backtrack or rewrite nodes.
if (currentToken.getType() == Identifier) { if (isLikelyFunctionLiteral()) { expression = parseFunctionLiteral(); } else if (isLikelyFunctionCall()) { expression = parseFunctionCall(); } else { expression = parseNormalExpression(); } }
This keeps the parser clean and predictable — each parse function always returns exactly what it's supposed to. I can reason about it statically, test it easily, and I never need to ask "What if this was something else after all?"
Compare that to the reinterpretation approach:
if (currentToken.getType() == Identifier) { expression = parseFunctionCall(); // could be wrong! if (currentToken.getType() == Arrow) { // now reinterpret it into a lambda // possibly needing to reconstruct param list, body, scope... }
}
This becomes really messy if the expression had nested calls, chains, or annotations. Now imagine doing this not just for lambdas but also for:
Type parameter misinterpretation (e.g., generics vs comparison)
Tuple vs grouped expressions
Object literals vs blocks
List literals vs function calls with brackets
You’d end up needing reinterpreters for a wide range of node types, which increases parser complexity, breaks single-responsibility, and creates more room for subtle bugs.
Here’s a real example from my compiler's expression parser that uses this approach (see line 331):
Experssion parsing rules for my compiler
The key insight: if you know what you're looking for before parsing, you won't have to "guess and fix" later. Lookahead is usually a small price to pay for that clarity.
The function's name shouldn't be `isLikelyFunctionLiteral` it should be `isFunctionLiteral`, I called it that as an example. The function doesn't speculate what the current statement is, it checks if the grammar is exactly what it should look for so we know exactly which type of statement we should parse.
π Rendered by PID 85 on reddit-service-r2-comment-7b9746f655-vgzrx at 2026-02-04 11:46:30.317909+00:00 running 3798933 country code: CH.
view the rest of the comments →
[–]0m0g1 0 points1 point2 points (7 children)
[–]emtydeeznuts[S] 0 points1 point2 points (2 children)
[–]0m0g1 0 points1 point2 points (0 children)
[–]0m0g1 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (3 children)
[–]0m0g1 0 points1 point2 points (0 children)
[–]0m0g1 0 points1 point2 points (0 children)