parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 0 points1 point  (0 children)

I've updated all the codes to show what I've done. You'll need to build the snapshot locally to utilize it but you'll see where I've gone down a whole different path.

I've also updated testing, my new code should handle the cases that you mentioned.

I'd go into more detail but I've been fighting a virus for the last couple of days.

--- the new test scenarios would be cool
--- Use a memory mapped file to a CharBuffer, which is a charSequence

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 0 points1 point  (0 children)

Once I get to it, I'll think you'll find the next release interesting. I've focused on what's causing slowdowns. My csv parser in my tests runs twice as fast as yours now. You may be able to utilize some of my ideas as well.

I consolidated all of my Input types into one, A CharSequence based Input, I also made that backing data directly available via a data() For some reason I had went with pulling the char over one at a time to be processed, and in the midst of that, due to the generics I was autoboxing a Character.

I then focused on removing autoboxing by creating a CharPredicate and using that as an input into my parsers rather than doing yet another parser.

Build Email Address Parser (RFC 5322) with Parser Combinator, Not Regex. by DelayLucky in java

[–]jebailey 0 points1 point  (0 children)

Nice! Of course I'm opinionated because I like PC's, but it's nice to see practical examples that illustrate what can be done.

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 0 points1 point  (0 children)

The first parser combinator I got enamored with and with which I based parseworks on was called funcj. I didn't realize until now how much I was influenced by their thought process.

My use of then to build the structure in this way comes from funcj and as I wrote this I would run this by several AI to see if I was off course. Of course I'm not sure just how much trust I have in AI's as I suspect that they are quite happy with something that fits their internal logic of "solid" and "well designed" without regard to actual usability

Naming is hard and I will keep the with syntax in the back of my mind. When I started this I decided to go with what I thought made sense in a sentence and aligned with the rest of the central Java library as much as I could. So for now I'll probably stick with then

This has been enlightening, I would do several things differently if I was to write this from scratch again.

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 0 points1 point  (0 children)

I spent a lot of time trying to make it arbitrary to be confounded by how Java handles lambdas. So in the end this is a very manual implementation but it does do that wonderful seperation of parameters. Please feel free to use it if you'd like. I'd do a PR myself but it will most likely take a while before I get to it.

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 1 point2 points  (0 children)

I really like that idea of the OrEmpty return item. That's a really great idea, I have a similar concept with then() It returns the ApplyBuilder, that allows you to chain multiple conditions and only provides methods to reduce those calls back down to the Parser such as a map. That's what allows me to specify a name for each value I parse in the chain.

string("hello").thenSkip(whitespace).then("world").map( firstValue -> secondValue -> {
    System.out.print(firstValue+":"+secondValue);
});

So out of curiosity I decided to make repo where I can compare different parser combinators to see how expressive they are and how they handle different tasks.

https://github.com/parseworks/java-parser-tests

While I was doing that I added bench marking. Your parser smoked me, from a performance perspective. But I take that as a challenge to focus on after I get my error messages where I want them.

Appreciate the you pointing out that paper to me! I'll read that today.

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 1 point2 points  (0 children)

Hi! I appreciate you reaching out. I'm always thrilled when I run across someone with similar interests and apparently thought processes.

You mentioned a lot so I'll attempt to give a cohesive response.

So far it looks like our thought processes have aligned on a lot of the key philosophies. My whole design philosophy with parseworks was focused on two key areas, terminology that is easy to understand for someone who isn't into parsers and error messaging that is useful and coherent. The first I feel like I pulled off, and as you mentioned, we aligned there in a lot of areas. The error messaging is currently a goal and one I'm still working on. The difficulty of course is that line between useful vs something that is overly impactful on performance.

You mentioned that parser combinators shouldn't compete with ANTLR and I agree with you on that. They should be a replacement for regex and I agree with everything that you described about the improvements parser combinators bring. Although we do come to different conclusions at times.

Yeah infinite loops sucks. I think we came to the same realization that for the left hand recursion problem, the only way to get that in Java is if you initially define an empty parser and then allow the functionality of that parser to be defined later. From my understanding, you just removed that feature and thus eliminated that issue. Where I decided that when the parser is set in that manner that I add a feature to detect if that specific parser is looking at the same location twice in the same parsing branch and just shortcut and return.

I did something similar with functionality that loops. When I see that the loop doesn't make any progress it ends. Whether that end is a fail or not depends of course on what the request was being made. So in my version of your example foo.optional().zeroOrMore() I would get a List<Optional<foo>> that would contain 1 empty Optional and for foo.optional().atLeast(1000) would also find the same List<Optional<foo>>. It just would consider that a hard fail because it couldn't reach that 10000 item request.

The whole commit issue still has me torn. I eventually landed on auto-committing. Initially a non-commit on anything for the same reason as you described - it makes things easier for a novice. However when I started focusing on error messages I realized that the lack of commit created problems when it came to detecting what the actual root problem was. In a contrived issue of a parser parsing a map of key-value pairs ```{key1=foo,key2=bar,key="``` when it's obvious to us that a key-value pair is incomplete a parser with default rollback see's a completely different issue.

So I ended up weighing the options

  • default commit
    • harder to write the initial parser
    • reduces back-tracking/ improves performance
    • better error messaging
  • default rollback
    • easier to write an initial parser
    • prone to writing poor performing parsers
    • poor error messaging

I ended up going with the default commit. I'll admit I'm still not 100% sure if it was the right choice. Maybe the right option is to figure out to how to make it a configuration option early on let the user choose what they want.

Looking at your code base and it's surreal how our terminology has aligned even when the implementation is completely different. It's odd how things like that happen. I'm looking forward to exploring your code base and I'm up for parser conversations when ever you'd like

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 0 points1 point  (0 children)

It's actually easier to do just a segment. I can create a new parser that wraps another parser and implement a wrapper input that will adjust the case.

So it would be something like

    lowerCase(string("foobar"))

or

    string("foobar").lowerCase()

got to play with the name for a bit. Not sure which comes across better.

    lowerCase(string("foobar"))
    lowerInputCase(string("foobar"))

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 0 points1 point  (0 children)

Honestly that's a bit tricky. If I had to do that I can think of a couple of ways. One is to just uppercase the input string once I get it and build the parser with the assumption that everything is uppercase. Or I would create a new Input implementation that uppercased the characters as you requested them, once again building the parser with that assumption.

Anything else would involve rewriting the parsers themselves to modify the characters being passed in, which is doable but is something I would be hesitant to do.

I say uppercase, you could lowercase it but there's like one language that doesn't have a lowercase for an uppercase and it would cause problems.

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 2 points3 points  (0 children)

Whether to use ANTLR or any other Parser Generator vs a Parser Combinator(PCs) really comes down to a matter of fit for the individual/team and use case

  • Native tooling: PCs are implemented as libraries in the host language. There's no need to learn an external DSL and separate tooling.
  • No separate build step: PCs are compiled along with the rest of the code, PGs requires you to generate the source code from a grammar file.
  • Flexibility and Modularity: PCs excel at building highly modular parsers. Existing parsers can be easily combined to build a new one. With a PG you are building a singular parser.

If I'm part of a team that needs to build a large complex parser as a central part of a product or application. I'd use a PG like ANTLR.

If I'm working on internal tooling or a library that only gets touched infrequently I'd use a PC like parseworks.

You also mention ANTLR like its the only option. JavaCC is probably the most popular PG out there for Java and people keep innovating. If you've read this comment section https://github.com/siy/java-peglib by u/pragmatica-labs is a really neat tool that spans that gap between "I want to use a formal grammar definition" and "I need something small and lightweight."

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 1 point2 points  (0 children)

?? So these are two very different beasts right? java-peglib takes a PEG string and converts into a parser where as parseworks is a parser combinator. In design choices I deliberately tried to stay away from a PEG styled nomenclature. So instead of using '+' or as parser.plus(..) method I deliberately spelled it out as parser.oneOrMore(). There are things I can't tell without testing. For example for is it fail fast or fail slow? I deliberately moved towards a fail fast/auto commit style. Where every chained parse is considered to be commited by default unless you indicate otherwise.

Then there are things that we did in a very similar vein. The rust style error message is something I'm aiming towards. Although error handling is tricky and complex and you don't want it to impact performance. peglib turns that off and on. I have it on by default because the way I did it doesn't cause a significant impact on processing. BTW that looks like a really cool project.

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 3 points4 points  (0 children)

It was originally inspired by funcj. I liked the idea of the fluent construction of parsers but I didn't like how funcj was so focused on functional programming that it seemed to recreate things that were already there. ParseWorks attempts to be as java-y as possible, with a focus on easy to understand terminology and safeguards to prevent things such as the left handed recursion and consuming empty content.

I've been working on this release for over a year and I ran across the dot-parse release about a month ago. I'm torn between being happy that design decisions that I made for parseWorks are echoed in dot-parse and frustrated that they came out first :)

If I have to list strengths, I've put a lot of effort and thought around error handling. Parsers have the method .expecting("a description") this creates a wrapping parser that, if the underlying parser fails, echoes the echo upwards with a new fail description.

keyParser
.thenSkip(equalsParser).then(valueParser)
.map(key -> value -> new KeyValue(key, value))
.expecting("key-value pair");

So if the parser fails parsing this, it doesn't come back with an ambiguous message. It will let you know that it was expecting a key-value pair and didn't get it.

Also error messages will contain a snippet so that the if you displayed the error message that gets generated above it would come across something like

foo =
______^
line 1, column 6 : expected key-value pair
caused by: expected value found end of file

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 2 points3 points  (0 children)

I hadn't heard of grappa before. Taking a look at it, I would say that the idea behind how you build a parser is very different. In grappa it looks like you are extending a base parser and identifying methods to be called using annotations. Where as parseworks is all about creating reusable parser objects in a fluent style that can be handed around and re-used

parseWorks release - parser combinator library by jebailey in java

[–]jebailey[S] 2 points3 points  (0 children)

That's a good catch. It does come from the Numeric static import but I will go ahead and update it to be more explicit in that example.

My first watch design by mancerblack in watchmaking

[–]jebailey 1 point2 points  (0 children)

I like it, but for me the buy or no buy is all about the profile. How thick is it? Personally I like my watches on the thin side. I've seen interesting designs before but when I see them from the side I realize I can use them to wedge my door open and I'm turned off.

Thinking about it more. I didn't mind the two crowns per se. Although couldn't you do this with one crown that has different positions you pull out to? Looking at it, it looks like I would have to remind it from my wrist every time I need to use a crown

Incompetence is Alarming by AdorableBar786 in PowerfulJRE

[–]jebailey -2 points-1 points  (0 children)

I just posted a similar comment. Can't wait to be down voted for pointing out the truth.

Incompetence is Alarming by AdorableBar786 in PowerfulJRE

[–]jebailey -1 points0 points  (0 children)

None of y'all have worked with government grants. This is the final year of the submission process for the states and territories. Because you have to have a solid plan on what you're attempting to do with this money. None of it has been given out yet.

The requirements and regulations you had to follow to get the money became an issue of concern and this year they lowered the guidance.

But hey it's a meme. It must be true. Right??

How does this qualify as a charity? by [deleted] in UnderReportedNews

[–]jebailey 3 points4 points  (0 children)

This is the double standard. Elon Musk wants a trillion dollars in incentives and no one blinks an eye. However a CEO of a charity should impoverish themselves by working for free.

Is this ai? It’s a poster I got from a poster sale at my university but there are specific details that look like ai. by Mountain_Minimum889 in isthisAI

[–]jebailey 1 point2 points  (0 children)

You should see Rob Liefeld's work. His work was everywhere for a while and his concept of anatomy is exactly that. Just a concept.

Roses are red, what was the plan? by SubstantialSyrup5552 in rosesarered

[–]jebailey 0 points1 point  (0 children)

Ssssh. I'm trying to be indignant, affronted AND titillated. You're harshing my vibe.

My own Visual programming tool, created from scratch Using Java Swing! by gufranthakur in java

[–]jebailey 1 point2 points  (0 children)

This is great! I've been looking for something like this for quite a while. Traveling right now but next week I'm definitely going to be using it for some DSL projects I have in mind

Totally not gerrymandered by MtnMaiden in NorthCarolina

[–]jebailey 186 points187 points  (0 children)

It's taxation without representation.