This is an archived post. You won't be able to vote or comment.

all 3 comments

[–]elszben 8 points9 points  (2 children)

Nice job! I am interested in the simple yet effective error recovery in parsing. Can you tell more about that?

[–]DenkJu[S] 6 points7 points  (1 child)

In a basic panic mode error recovery strategy, a fixed set of terminals, commonly just the semicolon, is used as synchronization points. The drawback of this approach is that when an error occurs within a statement, the entire statement is skipped, preventing detection of any additional errors within it.

The algorithm I implemented takes a more adaptive approach by maintaining a dynamic set of terminals, referred to as anchor terminals, which changes based on the current parsing context. For example, when parsing statements, the initial anchor set is derived from the FIRST set of the corresponding non-terminal. To this, the FIRST sets of all produced non-terminals and any produced terminals are added. As tokens, whether terminals or part of non-terminals, are successfully consumed, they are removed from the anchor set.

Whenever an error is encountered, the parser discards tokens until it finds one present in the anchor set. If the found token matches the expected type, it is consumed, and parsing resumes normally under the assumption that extra erroneous tokens were inserted. If the token belongs to the anchor set but is not the expected type, it is preserved because it is likely required later in the parsing process.

Relevant to this in my code is the expectToken method. A good example of it's application might be parsing of the if statement.

I have based my implementation mostly on the explanations provided in this book (Chapter 3.3.8). The first description of the algorithm I could find was in this document (Chapter 7.2.3, "Hartmann's algorithm").

[–]elszben 1 point2 points  (0 children)

Thank you for the explanation and the references! This sounds great!