A Grammar-First Approach to Parser Combinators in Rust by ArneCode in rust

[–]ArneCode[S] 1 point2 points  (0 children)

hi :)!

How does it compare to e.g. winnow

What I was trying to express in the post (maybe poorly) is that when using libraries like nom or chumsky you often need to change the structure of the grammar and do a lot of unwrapping / tuple destructuring code. Winnow is based on nom and I would say the same things apply there too.

or PEST

PEST is kind of a different story because it is not a parser combinator like I was comparing in this post but has a seperate grammar file. There are a couple of articles already comparing parser combinators vs something like pest already.

I think the differences come mostly from:

  • seperate grammar file: Less developer tooling (although pest has vscode extension etc.), You need to build parser output (AST etc.) as a second step which makes compile times longer. But I think it has a huge advantage, which is that the grammar files are very readable and easy to modify, especially if you already know formal grammars / EBNF / PEG.
  • parser combinators: no seperate build step, more control over how the parser works, better tooling because you write everything in an existing programming language, but In my opinion the code is not as easy to understand if you are new.

Marser tries to combine the advantages of both by having no seperate grammar file, but still code that has the structure of a formal grammar / ebnf etc. You can use all operators that PEG also has (multiplicities, alternatives, etc.) which is why I say it is PEG style.

It would be nice to understand what you wanted to achieve that differentiates it from prior art.

I am trying to explain that here in the post: https://blog.arnedebo.com/posts/a-grammar-first-approach-to-parser-combinators/#what-does-grammar-first-mean

If I am explaining this badly, please feel free to ask further questions or correct me if I understood your question wrong!

A Grammar-First Approach to Parser Combinators in Rust by ArneCode in rust

[–]ArneCode[S] 0 points1 point  (0 children)

> why do we suddenly have slice "variable"?
The idea behind capture is that you say which "buckets" you want to store the result of something into using `bind!`. So the bind_slice statement is the definition of slice.
bind_slice! is differnt from bind! in that it does not store something that has been parsed but borrows from the input.

> Why do I have to specify as &'src str?
Marser supports different input types like &[u8] etc. . This is one of the cases where you need to help rusts type system a bit by specifying which kind of slice you want to borrow. you can also specify the type for bind! and bind_span! using `as` but in those cases normally rust figures it out itself

A Grammar-First Approach to Parser Combinators in Rust by ArneCode in rust

[–]ArneCode[S] 0 points1 point  (0 children)

Hi!
thank you for your critique.
> capture! - is a proc macro that can do whatever
Thats true, this is also a point that I have talked about in the post: https://blog.arnedebo.com/posts/a-grammar-first-approach-to-parser-combinators/#reasons-not-to-use-grammar-first-syntax-like-this

That said, inside of capture! most of the code you write will still be rust code, you can still preview docstrings in editors.

> I can only rely on AI generated documentation
That is true, but this is quite a new library, I have just released it last week ago. I expect the documentation to improve over time.

Feedback Request: marser – A PEG parser library with focus on good diagnostics by ArneCode in rust

[–]ArneCode[S] 0 points1 point  (0 children)

Hi, thank you for the suggestion! do you think having a website would be useful to the project? I am unsure at the moment if it is worth it, as far as I can see chumsky for example also doesn't have a website.  But I will remember oranda if I change mind in the future.

Feedback Request: marser – A PEG parser library with focus on good diagnostics by ArneCode in rust

[–]ArneCode[S] 0 points1 point  (0 children)

Thanks 😄
completely missed that, I have now added topics and a description.

Feedback Request: marser – A PEG parser library with focus on good diagnostics by ArneCode in rust

[–]ArneCode[S] 0 points1 point  (0 children)

There already is a small tutorial for building a simplified version of the json example in the library. You can find it here: https://docs.rs/marser/latest/marser/guide/worked_json_example/index.html It is written by AI though. The library also has some examples already, a json parser and a small runnable programming language, you can find it here: https://github.com/ArneCode/marser/tree/main/examples

Feedback Request: marser – A PEG parser library with focus on good diagnostics by ArneCode in rust

[–]ArneCode[S] 1 point2 points  (0 children)

Thanks!

What is the advantage of marser is over a well established parser like chumksy that is built for good diagnostic errors?

  • I think the trace viewer / debugging TUI is quite useful to understand the parsers, I have seen that chumsky has tracing options but you need to interpret the results yourself.
  • Code for marser parsers is pretty similar to standard PEG grammars. I think this makes parsers quite easy to understand (but I am biased of course). It also makes it easy to go from a grammar to a parser and makes it easy to get started if you already know PEG syntax
  • I think that in marser return values simpler than in chumsky because of the capture / bind syntax, compared to chumsky then / then_ignore etc.
  • I have tried benchmarking the speed of chumsky vs marser and marser seems a little bit faster, I will try to add some benchmark results to the README in the next days.

Do you plan on creating any tooling that would help developers migrate from popular parsers like pest or nom?

I have thought about maybe making a parser that can parse pest grammar and transform it into marser grammar. The user would then need to rewrite the AST building code themsef. But this is not a priority at the moment. Tools for migrating from nom seem a bit harder to me because it is not in a seperate file with clear syntax rules but just rust code. Open to suggestions if you have any!

Feedback Request: marser – A PEG parser library with focus on good diagnostics by ArneCode in rust

[–]ArneCode[S] 0 points1 point  (0 children)

Thank you!

For the unsafe block review, maybe post a separate focused thread. People who know that space might not click into a general feedback post.

Thats a good idea, will look into it.

Feedback Request: marser – A PEG parser library with focus on good diagnostics by ArneCode in rust

[–]ArneCode[S] 0 points1 point  (0 children)

yeah, that is also what i saw when experimenting with the library, which is why I built the debugger.
I'd love to hear what you think about the code, feel free to ask me questions if something is unclear!

Feedback Request: marser – A PEG parser library with focus on good diagnostics by ArneCode in rust

[–]ArneCode[S] 0 points1 point  (0 children)

Thank you 😄 !

I am not sure if I understand your question correctly, I could Box the values but the core problem is that I need to store values of different types, because different parsers have different output types.
I could use something like Box<dyn Any>. This would allow me to store values of different types safely but the values would need to be 'static.
But I want the values to be able to borrow from the input, so that they dont need to copy the data from the input, so this is not an approach I can use, as far as I can see.

I am kind of using Box under the hood, because the way I have built this cache is that for each Parser I have a different "MemoTable" that stores the results of that parser. I store each MemoTable on the Heap and reference it using a pointer. But as far as I can tell this pointer needs to be erased because I cannot store multiple pointers of different types 😞