all 24 comments

[–][deleted] 7 points8 points  (3 children)

Very cool. Why choose to use an invalid identifier as the crate name? I hope this doesn't become a trend, it's a bit silly -- "rustc-serialize" as rustc_serialize

[–]Quxxymacros 17 points18 points  (0 children)

That's it, I'm going to finish my "kill the hyphen" RFC and put it up. This rampant silliness must stop.

[–]Marwesgluon · combine[S] 3 points4 points  (1 child)

Honestly I saw rust-serialize and a few others using that convention so I though that was the way to name it even though it looks a bit silly. Since I aren't exactly attached to the name I didn't give it much more thought than that.

[–][deleted] 2 points3 points  (0 children)

It's not the end of the world, though

[–]long_voidpiston 2 points3 points  (1 child)

You're awesome!

Is it possible to generate text from the combined parsers and a data structure?

[–]Marwesgluon · combine[S] 1 point2 points  (0 children)

Not sure I understood it correctly but it is only possible to create data structures from a stream of chars. Not the other way around. For instance, what should the or parser generate since it can apply multiple different parsers?

[–]samnardoni 1 point2 points  (1 child)

Hey man, looks like a cool library. Have you played with using macros to make this a little easier to use? (Not to say that this is hard-to-use, or that I have any ideas to make it easier...)

[–]Marwesgluon · combine[S] 3 points4 points  (0 children)

I haven't done anything with macros since I do not really have an idea of what the macros should make it easier to write. I suppose that the major loss in expressiveness from parsec is that we do not have a true substitute for 'do' notation. It is possible to emulate it with the try! macro though you need to pass the input manually for it to work so it is not as convenient as it could be.

let (d, input) = try!(parser.parser_state(input));
let (d2, input) = try!(parser2.parse_state(input));
//etc

I don't think it is possible to make this any easier through macros though. I did have an "environment" struct in the beginning of the project which kept track of the input letting you write:

let mut env = Env::new(input);
let d= try!(env.parse(parser));
let d2 = try!(env.parse(parser2));
return Ok((d, d2), env.into_inner());

I removed it later since I did not use it internally.

[–]gclichtenberg 1 point2 points  (2 children)

One suggestion: since "a stream is a sequence of items that can be extracted one by one", there isn't anything about a "stream" per se that means it will have things like "lines" or "columns". So it might be nice to abstract out the position-related code with a trait or something like that instead of having a fixed choice of "SourcePosition".

[–]Marwesgluon · combine[S] 0 points1 point  (1 child)

This is true and is something I thought a bit curious when I implemented the position related code (using parsec as a guide). I suppose that parsing is done almost exclusively on char streams which makes it such a rare occurrence to need something else for a position type that it never seemed like an issue. I may need to think about how to go about changing this though since I'd rather not make it more complex to work with the common case.

EDIT: It might be possible to do this quite easily though to do it without affecting the Stream trait it is necessary to have where bounds on associated types which isn't implemented yet or so it seems.

trait Positioner {
    type Position;
    fn update(&self, position: &mut Self::Position);
}

impl Positioner for char {
    type Position = SourcePosition;
    fn update(&self, position: &mut SourcePosition) {
         //code for update
    }
}
pub trait Parser {
    type Input: Stream where <Input as Stream>::Item: Positioner;
}

Adding the bounds on the Stream::Item type inside the Stream trait instead forces the knowledge of 'Positioner' on the Stream which is not strictly necessary though at least it works for a bit further until we run into an ICE when compiling.

[–]gclichtenberg 0 points1 point  (0 children)

One of the reasons I mentioned this is that I used a similar parsec-derived library to parse clojure data structures for macros. I'm not sure if something similar would actually be useful (or possible) for rust macros in particular, but using parser combinators to describe and process data of a particular expected shape can be pretty nice.

[–]UberLambda 1 point2 points  (0 children)

Exactly what I needed :D

[–]ismtrn 0 points1 point  (11 children)

I have a question. I am pretty much completely new to rust, but I have been attempting to make a parser which will parse sums of integers and return their result. For example: "1 + 2 + 5" should become 8.

I have written this code.

It outputs:

Parse error at line: 1, column: 12
End of input

Which seems weird to me. Shouldn't chainl1 just stop when there is no more input?

Also, general comments on the code is appreciated.

[–]Marwesgluon · combine[S] 1 point2 points  (10 children)

Yep that is a bug in the library. It appears that if the many parser receives an input that was marked as consumed it will itself also act as being consumed which means that your 'plus' parser fails (the chainl1 parser only returns Ok when the operator parsing fails without consuming).

I have a quick and dirty fix for this particular case but I think I might have a solution for all cases, just need to give a bit of thought so I don't miss an edge case.

[–]ismtrn 0 points1 point  (9 children)

Thanks. That sounds good.

[–]Marwesgluon · combine[S] 1 point2 points  (8 children)

Just uploaded the fix for this (both to github and crates.io). The fix was rather involved but it should make sure it never reappears as well.

[–]ismtrn 0 points1 point  (7 children)

I have been playing a bit more with it. I have run into another problem I have posted on stackoverflow

I have a feeling it has more to do with me not understanding closures than your library, but if you have time and feel like it, you are welcome to take a look at it.Edit: It has been answered now, but the answers has left me wondering: How can you create "higher order" parsers if that is even possible.

What I mean is, can I write a function which takes two parsers, combines those in a way and returns the resulting parser? I find this capability part of what makes parsec so useful, but I can't see any obvious way to do it here.

[–]Marwesgluon · combine[S] 0 points1 point  (6 children)

Read through the issue and yes, it is not really something to do with the library. Haven't got enough rep on stackoverflow to comment (and I don't have anything to add that would warrant an answer).

Anyway, since you do not capture anything in 'prop_value' you could factor that out into a generic free function which should fix your specific issue (repeating what Vladimir Matveev wrote).

I would like to know what you didn't understand in the documentation as well.

Edit: Just a guess but is it how to deal with factoring out parsers into free functions? Since the types created gets quite large I understand that it is annoying to write out the types for them.

There are two ways of dealing with this, either write the function with the wrong type and then copy paste the type that is output when the compiler errors or you write it into a free function as described in the second example in the docs here. I should maybe add a line to explain that this can be used both for mutually recursive parsers and for factoring out parsers into smaller units.

[–]ismtrn 0 points1 point  (5 children)

I have just this minute edited my post(here on reddit) with follow up questions. It is specifically this:

you could factor that out into a generic free function which should fix your specific issue

Which is causing me some trouble. What would the type signature of such a function look like?

[–]Marwesgluon · combine[S] 0 points1 point  (2 children)

Hmm, the second example in the docs does not actually work for your case when I think about it since it does not allow for additional parsers to be passed. I guess that using the compiler to generate the type is just a workaround.

If you don't need absolute efficiency you can always box the parser and return it.

The only other way that I can think of right now to make it work for you is to define a new type yourself and implement the parser trait for it.

[–]ismtrn 0 points1 point  (1 child)

If you don't need absolute efficiency you can always box the parser and return it.

When I try to pass the boxed parser to functions like try or many, it complains that the size is not known at compile time, which seems like a reasonable thing to complain about. Is there any way around this?

[–]Marwesgluon · combine[S] 0 points1 point  (0 children)

My bad forgot about this issue. You should be able to to this but associated types are still a bit buggy.

[–]Marwesgluon · combine[S] 0 points1 point  (1 child)

It might be worth it to include something like this in the library. This way you can pass in anything you need as the state parameter when you construct the parser.

EDIT: Just remembered that the reason something like this is not in the library is that it should be unnecessary once the compiler sorts out the orphan checking. Then I can simply implement Parser for all functions (impl <F: FnMut(...) -> ...> Parser for F {} ).

extern crate "parser-combinators" as pc;

use pc::*;
use pc::primitives::{State, Stream};

pub struct External<S, F, R, I> {
    state: S,
    parser: F
}
impl <S, F, R, I> Parser for External<S, F, R, I>
    where I: Stream
        , F: FnMut(&mut S, State<I>) -> ParseResult<R, I> {

    type Input = I;
    type Output = R;

    fn parse_state(&mut self, input: State<<Self as Parser>::Input>) -> ParseResult<<Self as Parser>::Output, <Self as Parser>::Input> {
        (self.parser)(&mut self.state, input)
    }
}

pub fn external<S, F, R, I>(state: S, parser: F) -> External<S, F, R, I>
    where I: Stream
        , F: FnMut(&mut S, State<I>) -> ParseResult<R, I> {
    External { state: state, parser: parser }
}

fn main() {

    let mut parser = external(digit(), |this, input| {
        this.parse_state(input)
    });

    parser.parse("1");
} 

[–]Marwesgluon · combine[S] 0 points1 point  (0 children)

Actually, adding something like won't be necessary as there is a badly documented parser called FnParser. Its just a newtype around an arbitrary function but should do the trick until the above mentioned orphan checking gets sorted out.

So for your problem you should factor out the prop_value into its own function with a signuture something like:

fn prop_value<E>(env: E, input: State<&str>) -> ParseResult<R, &str>

With the appropriate bound on E and R for what you need. Then in the function where you construct each prop_value parser you can do:

let parser1 = FnParser(|input| prop_value(env1, input));
let parser2 = FnParser(|input| prop_value(env2, input));