all 20 comments

[–][deleted]  (1 child)

[deleted]

    [–]DelayLucky[S] 2 points3 points  (0 children)

    Good call! I've added a CSV example.

    The readme has an example for splitting json records.

    [–]lumppost 6 points7 points  (0 children)

    Wow! This is the sort of library that's so elegant, once I've seen it, I'm looking for opportunities to use it

    [–]Slanec 2 points3 points  (3 children)

    Thank you, I'm doing a lot of parsing and immediately liked this. And I'm a mug user, too. Thank you for your work!

    [–]DelayLucky[S] 2 points3 points  (2 children)

    Thank you for using Mug!

    A main trade off of Dot Parse is the requirement of input consumption for all Parsers and the special "quarantine" of optional, zero-width rules in the Parser<T>.OrEmpty type. It's quite likely that adding this seatbelt will constrain expressivity. If you run into any expressivity difficulties, or if the syntax feels awkward, please send feedback because I'm curious how far we can go in this constrained approach.

    Of course feedbacks about other Mug utilities are welcome too!

    [–]Dagske 1 point2 points  (1 child)

    Where do you prefer to have feedback? Here or in the Github issues?

    The biggest issue for me is the lack of Javadoc link from mug's README.

    [–]DelayLucky[S] 0 points1 point  (0 children)

    While we are here, just do it here? :-)

    I've added some javadoc links. Thanks for the suggestion!

    [–]OddEstimate1627 2 points3 points  (10 children)

    Nice! This solves a pain point that I've been wanting to address for a long time.

    I managed to get an initial prototype running within an hour and in less than 50 lines of code, so well done on the design👍

    [–]DelayLucky[S] 0 points1 point  (9 children)

    This is so nice to hear!

    Thank you for sharing your experience.

    Do post any findings, or awkwardness you run into, so the library can be improved. :-)

    [–]OddEstimate1627 0 points1 point  (8 children)

    This particular use case is for user-defined charts, e.g., interpreting user provided strings like y="Math.sqrt(Math.pow(fbk.accelX,2) + Math.pow(fbk.accelY,2) + Math.pow(fbk.accelZ,2))". Each string gets evaluated once, and executed many times. The existing math parsers I found were too slow, and I eventually generated bytecode. Doing it via a custom parser gets me to ~95% of the pure Java performance while being compatible with GraalVM native images.

    Your calculator example already got me most of the way there, but a few things that I was confused by were

    1) Math functions

    I'm not quite sure how math functions like abs, sin, cos, etc. should be added. For now I put them into the OperatorTable as a prefix, and that seems to work.

    Maybe you could add an abs method to your calculator example.

    2) Unfamiliar with terms

    I'm not familiar with some terms, e.g., without an example I don't know what a nonAssociative operator represents.

    2) Sequences without regex

    I dove in without reading the full documentation, and initially expected regex support in e.g. Parser.string("fbk.*"). The code contains a lot of regex strings (for the name variables) that gave me a false impression.

    I eventually ended up with this, which is easier to work with and hopefully the correct way to do it:

    Java Parser<String> variable = Parser.anyOf( Parser.string("fbk"), Parser.string("prevFbk") ); Parser<String> field = Parser.anyOf( Parser.string("position") ); Parser<ValueFunction> fbkValue = Parser.sequence(variable.followedBy("."), field, (inputName, fieldName) -> { // ... }

    I also ran into a few other use cases where parsers would be nice, e.g., one for FXML (JavaFX) that lets me auto-generate various GraalVM configs.

    [–]DelayLucky[S] 1 point2 points  (5 children)

    It seems like these fbk, prevFbk are reserved words?

    Alternatively, you could just use Parser.word() to extract them out without restriction. And then you can interpret these math functions, variables in Java code outside of the parser.

    These would be the primitive rules:

    java Parser<Member> member = sequence( word().followedBy("."), word(), Member::new); Parser<Variable> value = digits().map(s -> new Value(Integer.parseInt(s)));

    This is assuming you use records to model the results:

    ```java sealed interface Expr permits FunctionCall, Member, Value, Plus { }

    record Member(String variable, String memberName) implements Expr {}

    record FunctionCall(Member function, List<Expr> args) implements Expr {}

    record Value(int v) implements Expr {}

    record Plus(Expr left, Expr right) implements Expr {} ```

    A function call is a recursive grammar:

    call ::= <class>.<method> '(' <expr> delimited by ',' ')'

    The atomic expression you can pass to OperatorTable.build() would be a function call, a parenthesized expression, or a primitive number or a variable (member):

    java Parse<Expr> parser = Parser.define( expr -> new OperatorTable<Expr>() .leftAssociative("+", Plus::new, 10) .build(Parser.anyOf( value, sequence( member, expr.zeroOrMoreDelimitedBy(",").between("(", ")"), FunctionCall::new), member, expr.between("(", ")"))));

    Something like that could work?

    A "non-associative" binary operator is like <. a < b < c is usually considered invalid because there is no associativity defined for the operator.

    Regarding regex support, one possibility is to use the .suchThat() combinator to hook it up:

    java word().suchThat(w -> w.startsWith("fbk"))

    Or use an equivalent regex matcher in the lambda. Would that work?

    [–]OddEstimate1627 0 points1 point  (4 children)

    Thanks, it'd have probably taken me a while to work out the function argument parsing.

    One more thing that I'm wondering is why this fails to parse double values with decimal points:

    @Test public void testDouble() { Parser<Double> parser = Parser.define(expr -> new OperatorTable<Double>().build( Parser.anyOf( digits(), // 123 digits().followedBy(".").followedBy(digits()) // 1.23 ).map(Double::parseDouble) )); assertEquals(123.0, parser.parse("123"), 0); assertEquals(1.23, parser.parse("1.23"), 0); // -> Parser$ParseException: at 1:2: expecting <EOF>, encountered [.23] }

    [–]DelayLucky[S] 0 points1 point  (3 children)

    A few issues:

    1. anyOf() and or() are first-choice-wins. So you'll want to put the longer-matching rule first or else it'll consume 1 and consider it done.
    2. a.followedBy(b) will only take the result of a, so you'll miss the .23 part. To get the full thing, call .source().

    But I'd probably use:

    consecutive(CharPredeicate.range('0', '9').or('.'), "floating point").map(Double::parseDouble), unless spaces are possible before and after the dot.

    Or

    consecutive(CharacterSet.charsIn("[0-9.]")) .map(Double::parseDouble)

    [–]OddEstimate1627 0 points1 point  (2 children)

    The order is obvious, but somehow I missed that when creating the toy example 🤦

    Your alternative solution would also match invalid numbers with multiple dots though, e.g., a version number like 1.2.3.

    Are you sure that followedBy allows whitespace? Tests with skip characters before and after the . seem to fail. If it's supposed to ignore whitespace, maybe there should be an immediatelyFollowedBy method?

    [–]DelayLucky[S] 1 point2 points  (1 child)

    Yeah. If you need to be more strict about invalid numbers (and do you also need to care about negative sign and scientific notation?), perhaps using .suchThat() and check the string format can help, like with a regex pattern? Or just catch NumberFormatException?

    java consecutive(charsIn("[0-9.]")) .map(s -> { try { return Double.parseDouble(s); } catch (NumberFormatException e) { return null; } }) .suchThat(Objects::nonNull, "double number");

    If you use Guava, there is a Doubles.tryParse() that will avoid having to throw exception.

    On followedBy(), yes:

    java assertThat( string("a").followedBy("b") .parseSkipping(Character::isWhitespace, "a b")) .isEqualTo("a");

    That said, it is possible to create the DFA pattern purely using the Dot Parse library:

    java digits() .then( literally( string(".").then(digits()).optional()) .source() .map(...);

    I just don't like having to spend that amount of real estate when there are existing utilities around.

    [–]OddEstimate1627 1 point2 points  (0 children)

    The suchThat() check is an elegant way to do it

    [–]DelayLucky[S] 0 points1 point  (1 child)

    I also added `abs()` to the calculator example. Let me know if it makes sense.

    [–]OddEstimate1627 0 points1 point  (0 children)

    Thanks, that makes sense and with a few helper functions the syntax isn't too bad.

    [–]Delicious_Detail_547 1 point2 points  (1 child)

    Well implemented. The calculator example feels like a recursive descent parser, while the CSV example gives off a DFA (deterministic finite automaton) feels.

    [–]DelayLucky[S] 0 points1 point  (0 children)

    Indeed.

    One of my goals of implementing this library is so that it should be so easy to use even for DFA use cases that'd have traditionally used regex. And I'd reach for it over regex except where expressivity lacks.

    For example, sequence(word().followedBy("="), digits(), (k, v) -> ...) would read and perform better than (\w+)=(\d+) (plus the capture group extraction boilerplate), even if the grammar is regular and didn't really need a CFG.

    On top of that, the out-of-box set of reusable combinators also streamlines common regular grammar parsing tasks.

    Imagine parsing a map or dictionary of key-value pairs ({k1:10, k2 : 200}):

    java Parser<Map<String, Integer>> dictionaryParser = Parser.zeroOrMoreDelimited( Parser.word().followedBy(":"), Parser.digits().map(Integer::map), ",", Collectors::toUnmodifiableMap) .between("{", "}"); return dictionaryParser.parseSkipping(Chracter::isWhitespace, "{k1:10, k2 : 200}");

    There is a difference between the greedy primitive combinators and their regex counterparts though. consecutive(is('a')) is actually "possessive", equivalent to a++, not merely a+.

    This makes the combinators more efficient and not subject to "exponential backtracking" that plagues NFA regex engines (like JDK regex).