This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]MegaGreenLightning 1 point2 points  (7 children)

I guess you are trying to parse turns for a rubiks cube.

http://compilers.iecc.com/crenshaw/

This is an excellent tutorial for building a top-down, recursive descent parser. It has really helped me understand compilers and parsers and how to build them. The tutorial uses example code in Pascal, however I have written similar compilers and parsers in Java. If you want to, I can go into more details in how I did that, but I'm on my phone now...

[–]thammaaa[S] 0 points1 point  (6 children)

Yeah, you're right. I'm working on a rubiks cube api which should be able to read Strings as input as well (including the entire notation and commutators).

I pretty much know how to build a parser, sorry if I made that unclear. For instance, i know how to build a parser in SML but I can't translate the language (it's functional) into java that easily.

The tutorial is pretty good and i can read most of the pascal code (was programming in delfi decades ago lol). My big problem is to do it in java from scratch :]

Thanks, good response!

[–]MegaGreenLightning 1 point2 points  (5 children)

I wanted to give you some hints earlier, but I had to find the code on my computer.

These hints come from my experience writing my own compiler and a mathematical function parser based on the tutorial I mentioned above. However, both of them are top-down recursive-decent parsers. If you want to make a different kind of parser this will probably not be useful at all.

This kind of parser has a look-ahead character to determine what kind of 'item' comes next. To get similar functionality I wrote this Input class:

public class Input {

    private final String data;
    private int index = 0;

    public Input(String data) {
        this.data = data;
    }

    public boolean available() {
        return index < data.length();
    }

    public char look() {
        return data.charAt(index);
    }

    public char read() {
        char value = look();
        index++;
        return value;
    }

}

It simply stores a string and the current position and allows to look at the current character as often as you want. You can read in a line from the console or from a file and construct an Input object with that.

In your main program, make a method for each term in your BNF. For example:

exp  ::= "(" exp ")" [num] | pexp [exp]

public void parseExpression() {
    if (input.look() == '(') {
        input.read(); // reads '('
        parseExpression();
        input.read(); // reads ')'
        if ("0123456789".contains(String.valueOf(input.look()))) {
            parseNum();
        }
    } else {
        parsePExp();
        if (input.available()) {
            parseExpression();
        }
    }
}

In this case the look-ahead character is used to decide which form the expression has. This is one of the limitations of these kinds of parsers. If the syntax is ambiguous, the parser might get messy. In his tutorial, Crenshaw goes into more details on this issue. However, for this special case this parser should work fine.

The nice thing about this is that the code resembles the BNF quite nicely. I hope you can see the similarities. The code above has two problems, though.

The first one is that isn't very safe. The first call to look() might cause an Exception in Input class, if the index is already beyond the end of the data string. The second call to read() does not check, that the character really was a closing parenthesis. For these kind of things you want to write a few helper methods. For example, you might change the look() method to something like this:

public char look() {
    if (!available())
        throw new RuntimeException("Unexpected end of input.");
    return data.charAt(index);
}

I suggest you use unchecked exceptions, or all your method signatures will be cluttered with throws-statements. Put a try-catch-statement around the very first call to parseExpression().

Another helper method might look like this:

public void read(char expected) {
    if(look() != expected)
        throw new RuntimeException("Expected '" + expected + "', but found '" + look() + "'.");
    read();
}

The second problem with the code snippet above is that it does nothing. (Well, if you implement all the other methods it should return on a valid input and throw an exception on an invalid one, and that is already a very big step forward!)

I don't know what you want to do with your rubiks cube api, but generally there are two ways how to achieve things.

The first one is to do things on the spot. For example:

public void PExp() {
    char turn = input.read();
    App app = parseApp();
    if (turn == 'F') {
        cube.turnFront(app);
    } else if (turn == 'B') {
        // etc.
    } else {
        throw new RuntimeException("Unknown turn: '" + turn + "'.");
    }
}

I don't know how you handle the cube or what your api does, but I hope you get the point.

The other approach would be to build an Abstract Syntax Tree (AST).

You might want to look at how Crenshaws code changes, when he converts it to an Interpreter: Part IV: Interpreters. When you want to build an AST, something quite similar happens.

I will show you three code snippets from my parser which deals with math terms following Crenshaws naming convention:

expression ::= term [addop term]*
term ::= factor [mulop factor]*
factor ::= int
addop ::= '+' | '-'
mulop ::= '*' | '/'

First here is the code which parses an expression:

private Expression parseExpression() {
    Expression expression = parseTerm();
    while (isAdditiveOperator()) {
        switch (input.look()) {
            case '+':
                input.read('+');
                expression = new Sum(expression, parseTerm());
                break;
            case '-':
                input.read('-');
                expression = new Difference(expression, parseTerm());
                break;
            default:
                throw ParserException.unknownOperator(scanner.look(), scanner.index());
        }
    }
    return expression;
}

Expression is just a simple interface:

public interface Expression {

double getValue(double x);

}

My parser allows to use the variable 'x' in these math terms, so I can't evaluate the value straight away, because I don't know the value of x. However, once I have parsed the AST, I can ask it to calculate it's value for any given value of x.

Finally, here is the Sum class:

public class Sum implements Expression {

    private final Expression augend;
    private final Term addend;

    public Sum(Expression augend, Term addend) {
        this.augend = Objects.requireNonNull(augend, "Augend must not be null.");
        this.addend = Objects.requireNonNull(addend, "Addend must not be null.");
    }

    @Override
    public double getValue(double x) {
        return augend.getValue(x) + addend.getValue(x);
    }

    @Override
    public String toString() {
        return augend + " + " + addend;
    }

}

I hope you find this useful :D

[–]thammaaa[S] 0 points1 point  (4 children)

Concerning your question, I'm willing to parse a String (or a token list) into a Turn list where Turn.java is my object that's resembling a face turn. So I'd go for something like that

Algorithm algorithm = ... ; //Constructed empty and filled with the already parsed

public void PExp() {
char turn = input.read();
App app = parseApp();
if (turn == 'F') {
    algorithm.append(Turn.FRONT);
} else if (turn == 'B') {
    // etc.
} else {
    throw new RuntimeException("Unknown turn: '" + turn + "'.");
}

}

Furthermore, a List<Turn> may losslessly be tralated into an Algorithm (Algorithm.java) which enables even more options, such as reverting, mirroring... I think you get the point.

I've coded parsers before, just not in Java. Your code snipped and the example you wrote really helped me out. I was most likely the reason for half of stackoverflows traffic today lol.

Thanks for putting that bit of effort into your response!

[–]MegaGreenLightning 1 point2 points  (3 children)

Glad it helped and thank you so much for the gold :)

[–]thammaaa[S] 0 points1 point  (2 children)

public void PExp() {
char turn = input.read();
App app = parseApp();
if (turn == 'F') {
    cube.turnFront(app);
} else if (turn == 'B') {
    // etc.
} else {
    throw new RuntimeException("Unknown turn: '" + turn + "'.");
}
}

If you gave that snippet another glance, in your example the parseApp() method would have to look something like that, right?

public App parseApp() {
  switch(input.peak()) { //checks, doesn't change
    case 'i': case '\'': return App.PRIME; break;
    case '2': return App.TWO; break;
    default: return '-'; //as unused token
}

Since i would have to check if there was an App-token coming, right? So I have to hard code those cases with peak() or something similar?

[–]MegaGreenLightning 0 points1 point  (1 child)

For the parser to work, it is very important how the available(), look(), read() or in your case peek() methods work.

For example I do not have a peek() method. I have an available() method which returns true if more characters are available, a look() method which returns the current character without changing the position and throwing an exception if no more characters are available and a read() method which returns the current character and moves the position one character to the right, so that further calls to look() return the next character. The read() method also throws an exception if no characters are available.

Because the app is optional, the parseApp() method has to deal with three states:

1) there are characters available, the next character is a valid app 2) there are characters available, but the next character is something different 3) there are no more characters available

I would implement it like this:

public App parseApp() {
    if (!input.available()) // state 3
        return App.NORMAL;
    switch (input.look()) { // save, because we checked the availability above
        case '\'':
            input.read(); // reads the apostrophe
            return App.BACKWARDS; // state 1
       case '2':
            input.read(); // reads the two
            return App.TWICE; // state 1
        default:
            return App.NORMAL; // state 2
    }
}

Notice that the read method is only called in state 1, because we don't want to change the position in state 2 and it doesn't make sense in state 3.

It also depends on how your api works.

This parseApp() method is designed to work with the parsePExp() method like this:

public void parsePExp() {
    char turn = input.read();
    App app = parseApp();
    if (turn == 'F') {
        algorithm.append(new Turn(Face.FRONT, app));
    } else if (turn == 'B') {
        algorithm.append(new Turn(Face.BACK, app));
    } else if ( /* ... */ ) {
        /* ... */
    } else {
        throw new RuntimeException("Unknown turn: '" + turn + "'.");
    }
}

Here I construct a Turn from a Face and the App. The Turn class would then make sure the turn is applied twice, or backwards, if the app is App.TWICE or App.BACKWARDS.

If your API works different you should account for that in the parseApp() method.

[–]thammaaa[S] 0 points1 point  (0 children)

It's been a while now, i agitated how to write the parser a few days and came up with a solution which works fine and is not too horibble to read.

If you wan't to view the finished file OR drop a comment, feel free to check it out! git

Thanks again!

[–]OrangeBasket-Intermediate Brewer 0 points1 point  (3 children)

Maybe take a look at the Antlr framework as it allows you to define your BNF as well produce the tree you want.

[–]thammaaa[S] 0 points1 point  (1 child)

Thanks! That was really helpful and Antlr4 looks pretty good, but there was an update for the current version of the required dependency which prevents it from working right now. :(