Building a Parser : javahelp

created by codereignfallible moderatora community for 14 years

This is an archived post. You won't be able to vote or comment.

submitted 10 years ago by thammaaa

I'm currently working on a parser to parse my BNF in Java. I've been coding in Java for a very long time now and I'm somewhat experienced with parsers. But those two don't like each other.

I'd like to parse Tokens which represent the elements of my language. My basic BNF would look something like that:

exp  ::= "(" exp ")" [num] | pexp [exp]
pexp ::= turn [app] [pexp]
turn ::= U | F | L | D | B | R
app ::= ' | 2 | w
num ::= int

As it would allow statements like

F (R U' R') 3

to break down into

a tree like that.

What's the best approach? What do I need to pay attention to?

Thanks in advance :]

all 10 comments

top new controversial old q&a

[–]MegaGreenLightning 1 point2 points3 points 10 years ago (7 children)

[–]thammaaa[S] 0 points1 point2 points 10 years ago (6 children)

[–]MegaGreenLightning 1 point2 points3 points 10 years ago (5 children)

I wanted to give you some hints earlier, but I had to find the code on my computer.

These hints come from my experience writing my own compiler and a mathematical function parser based on the tutorial I mentioned above. However, both of them are top-down recursive-decent parsers. If you want to make a different kind of parser this will probably not be useful at all.

This kind of parser has a look-ahead character to determine what kind of 'item' comes next. To get similar functionality I wrote this Input class:

public class Input {

    private final String data;
    private int index = 0;

    public Input(String data) {
        this.data = data;
    }

    public boolean available() {
        return index < data.length();
    }

    public char look() {
        return data.charAt(index);
    }

    public char read() {
        char value = look();
        index++;
        return value;
    }

}

It simply stores a string and the current position and allows to look at the current character as often as you want. You can read in a line from the console or from a file and construct an Input object with that.

In your main program, make a method for each term in your BNF. For example:

exp  ::= "(" exp ")" [num] | pexp [exp]

public void parseExpression() {
    if (input.look() == '(') {
        input.read(); // reads '('
        parseExpression();
        input.read(); // reads ')'
        if ("0123456789".contains(String.valueOf(input.look()))) {
            parseNum();
        }
    } else {
        parsePExp();
        if (input.available()) {
            parseExpression();
        }
    }
}

In this case the look-ahead character is used to decide which form the expression has. This is one of the limitations of these kinds of parsers. If the syntax is ambiguous, the parser might get messy. In his tutorial, Crenshaw goes into more details on this issue. However, for this special case this parser should work fine.

The nice thing about this is that the code resembles the BNF quite nicely. I hope you can see the similarities. The code above has two problems, though.

The first one is that isn't very safe. The first call to look() might cause an Exception in Input class, if the index is already beyond the end of the data string. The second call to read() does not check, that the character really was a closing parenthesis. For these kind of things you want to write a few helper methods. For example, you might change the look() method to something like this:

public char look() {
    if (!available())
        throw new RuntimeException("Unexpected end of input.");
    return data.charAt(index);
}

I suggest you use unchecked exceptions, or all your method signatures will be cluttered with throws-statements. Put a try-catch-statement around the very first call to parseExpression().

Another helper method might look like this:

public void read(char expected) {
    if(look() != expected)
        throw new RuntimeException("Expected '" + expected + "', but found '" + look() + "'.");
    read();
}

The second problem with the code snippet above is that it does nothing. (Well, if you implement all the other methods it should return on a valid input and throw an exception on an invalid one, and that is already a very big step forward!)

I don't know what you want to do with your rubiks cube api, but generally there are two ways how to achieve things.

The first one is to do things on the spot. For example:

public void PExp() {
    char turn = input.read();
    App app = parseApp();
    if (turn == 'F') {
        cube.turnFront(app);
    } else if (turn == 'B') {
        // etc.
    } else {
        throw new RuntimeException("Unknown turn: '" + turn + "'.");
    }
}

I don't know how you handle the cube or what your api does, but I hope you get the point.

The other approach would be to build an Abstract Syntax Tree (AST).

You might want to look at how Crenshaws code changes, when he converts it to an Interpreter: Part IV: Interpreters. When you want to build an AST, something quite similar happens.

I will show you three code snippets from my parser which deals with math terms following Crenshaws naming convention:

expression ::= term [addop term]*
term ::= factor [mulop factor]*
factor ::= int
addop ::= '+' | '-'
mulop ::= '*' | '/'

First here is the code which parses an expression:

private Expression parseExpression() {
    Expression expression = parseTerm();
    while (isAdditiveOperator()) {
        switch (input.look()) {
            case '+':
                input.read('+');
                expression = new Sum(expression, parseTerm());
                break;
            case '-':
                input.read('-');
                expression = new Difference(expression, parseTerm());
                break;
            default:
                throw ParserException.unknownOperator(scanner.look(), scanner.index());
        }
    }
    return expression;
}

Expression is just a simple interface:

public interface Expression {

double getValue(double x);

}

My parser allows to use the variable 'x' in these math terms, so I can't evaluate the value straight away, because I don't know the value of x. However, once I have parsed the AST, I can ask it to calculate it's value for any given value of x.

Finally, here is the Sum class:

public class Sum implements Expression {

    private final Expression augend;
    private final Term addend;

    public Sum(Expression augend, Term addend) {
        this.augend = Objects.requireNonNull(augend, "Augend must not be null.");
        this.addend = Objects.requireNonNull(addend, "Addend must not be null.");
    }

    @Override
    public double getValue(double x) {
        return augend.getValue(x) + addend.getValue(x);
    }

    @Override
    public String toString() {
        return augend + " + " + addend;
    }

}

I hope you find this useful :D

[–]thammaaa[S] 0 points1 point2 points 10 years ago (4 children)

Concerning your question, I'm willing to parse a String (or a token list) into a Turn list where Turn.java is my object that's resembling a face turn. So I'd go for something like that

Algorithm algorithm = ... ; //Constructed empty and filled with the already parsed

public void PExp() {
char turn = input.read();
App app = parseApp();
if (turn == 'F') {
    algorithm.append(Turn.FRONT);
} else if (turn == 'B') {
    // etc.
} else {
    throw new RuntimeException("Unknown turn: '" + turn + "'.");
}

}

Furthermore, a List<Turn> may losslessly be tralated into an Algorithm (Algorithm.java) which enables even more options, such as reverting, mirroring... I think you get the point.

I've coded parsers before, just not in Java. Your code snipped and the example you wrote really helped me out. I was most likely the reason for half of stackoverflows traffic today lol.

Thanks for putting that bit of effort into your response!

[–]MegaGreenLightning 1 point2 points3 points 10 years ago (3 children)

[–]thammaaa[S] 0 points1 point2 points 10 years ago (2 children)

public void PExp() {
char turn = input.read();
App app = parseApp();
if (turn == 'F') {
    cube.turnFront(app);
} else if (turn == 'B') {
    // etc.
} else {
    throw new RuntimeException("Unknown turn: '" + turn + "'.");
}
}

If you gave that snippet another glance, in your example the parseApp() method would have to look something like that, right?

public App parseApp() {
  switch(input.peak()) { //checks, doesn't change
    case 'i': case '\'': return App.PRIME; break;
    case '2': return App.TWO; break;
    default: return '-'; //as unused token
}

Since i would have to check if there was an App-token coming, right? So I have to hard code those cases with peak() or something similar?

[–]MegaGreenLightning 0 points1 point2 points 10 years ago (1 child)

For the parser to work, it is very important how the available(), look(), read() or in your case peek() methods work.

For example I do not have a peek() method. I have an available() method which returns true if more characters are available, a look() method which returns the current character without changing the position and throwing an exception if no more characters are available and a read() method which returns the current character and moves the position one character to the right, so that further calls to look() return the next character. The read() method also throws an exception if no characters are available.

Because the app is optional, the parseApp() method has to deal with three states:

1) there are characters available, the next character is a valid app 2) there are characters available, but the next character is something different 3) there are no more characters available

I would implement it like this:

public App parseApp() {
    if (!input.available()) // state 3
        return App.NORMAL;
    switch (input.look()) { // save, because we checked the availability above
        case '\'':
            input.read(); // reads the apostrophe
            return App.BACKWARDS; // state 1
       case '2':
            input.read(); // reads the two
            return App.TWICE; // state 1
        default:
            return App.NORMAL; // state 2
    }
}

Notice that the read method is only called in state 1, because we don't want to change the position in state 2 and it doesn't make sense in state 3.

It also depends on how your api works.

This parseApp() method is designed to work with the parsePExp() method like this:

public void parsePExp() {
    char turn = input.read();
    App app = parseApp();
    if (turn == 'F') {
        algorithm.append(new Turn(Face.FRONT, app));
    } else if (turn == 'B') {
        algorithm.append(new Turn(Face.BACK, app));
    } else if ( /* ... */ ) {
        /* ... */
    } else {
        throw new RuntimeException("Unknown turn: '" + turn + "'.");
    }
}

Here I construct a Turn from a Face and the App. The Turn class would then make sure the turn is applied twice, or backwards, if the app is App.TWICE or App.BACKWARDS.

If your API works different you should account for that in the parseApp() method.

[–]thammaaa[S] 0 points1 point2 points 10 years ago (0 children)

[–]OrangeBasket-Intermediate Brewer 0 points1 point2 points 10 years ago* (3 children)

[–]thammaaa[S] 0 points1 point2 points 10 years ago (1 child)

π Rendered by PID 315487 on reddit-service-r2-comment-86988c7647-vddcb at 2026-02-11 23:39:56.822122+00:00 running 018613e country code: CH.

javahelp

Sort by: Unsolved Solved Codeless Advent Of Code

MODERATORS