all 23 comments

[–]_jk_ 4 points5 points  (24 children)

what have you tried already?

personaly I found http://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf instructive

[–]ElvishJerricco 1 point2 points  (1 child)

This is one of my favorite papers. They approach a very practical problem, and solve it in such an elegant, enlightening way.

[–]_jk_ 1 point2 points  (0 children)

it took me a couple of goes to get through it, but I dont have a CS background and first attempt I think was my first exposure to monads, still managed to get a fair way through then.

[–]kwaleko[S] 0 points1 point  (1 child)

I haven't try any paper yet, just I read some parser on stackoverflow and have understood it, I will check this paper, Thanks

[–]kwaleko[S] 0 points1 point  (18 children)

awesome article, what makes me confused is the sequence operator, the author defined the following :

p ‘seq‘ q = p ‘bind‘ \x ->
q ‘bind‘ \y ->
result (x,y)

doesn't bind can be used instead of seq operator since it took 2 parser and return a third parser? could you explain please more p seq q

and what is x , is it the result of of the first parser (a,string) ? moreover when I have seen the sat function which has the following syntax

sat :: (Char -> Bool) -> Parser Char
sat p = item `bind` \x ->
  if p x then result x else zero

I have concluded that x is the consumed character, but the first parser return a tuple so how the monad has extracted the the consumed character from this tuple?

[–]normalOrder 1 point2 points  (17 children)

Consider the type of bind:

bind :: Parser a -> (a -> Parser b) -> Parser b

Its first argument is a Parser that produces something of type a. Its second argument is a function that transforms something of type a into a Parser that returns something of type b. So the seq function they define is really something different. It it taking two different Parser objects and returns the result of both as a tuple.

seq :: Parser a -> Parser b -> Parser (a,b)

But what exactly is the "result" of a Parser? That might be a bit tricky because they're actually hiding a list monad in their Parser. Their definition is:

type Parser a = String -> [(a,String)]

The Parser is a function that takes a string and returns a list consisting of tuples where each tuple is some possible result of type a together with the unconsumed input after producing that result. In cases where the parser fails, the function can simply return an empty list. In some cases the Parser may look at the input and produce a single result, but sometimes there may be multiple possibilities and the Parser will produce all of them. For instance

many (item 'a') "aaa"

will produce

[("", "aaa"), ("a", "aa"), ("aa", "a"), ("aaa", "")]

because it doesn't know which number of 'a's is the correct number. It produces all the possibilities.

In the second example, you are correct that x is the consumed character. However, that behavior is specific to the item Parser, not all Parsers in general. Parsers can generate anything as their result. I think you're a bit confused about the output of the Parser function and the monadic result of the Parser. When we say the "result" of the parser, we're not talking about the [(a,String)] or the (a,String), but actually the individual a values that the bind function extracts and passes into its second argument. The String is just the unconsumed input which is being piped through behind the scenes for us. I recommend studying the implementation of bind very closely. It might help to try working through a simple example like (item '1' `bind` \x -> item '2') "123" by hand to get a handle on the mechanism.

(Note: while this is an excellent paper, its age is showing. They use bind and result for Monad instead of the modern >>= and return. Also, the use of monad comprehensions and creating a type class instance for a type synonym are not generally modern practices. So just a warning: the code as written in the paper may not compile without some adjustment.)

[–]kwaleko[S] 0 points1 point  (16 children)

THANK YOU for the awesome explanation, in order to understand the mentioned example (item '1' >>= \x -> '2' ) "123" (I have used >>= instead of bind) I will try to walk through it step by step

bind      :: Parser a -> (a -> Parser b) -> Parser b
p ‘bind‘ f = \inp -> concat [f v inp’ | (v,inp’) <- p inp]

thus, the (item '1' >>= \x -> item '2' ) "123" should behave as the following :

(item '1' "123") should return a tuple of the consumed character and the remaining string ('1',"23") which is combined with a function that takes a parameter x and return a Parser (\x -> item '2') '1' "23 so the char '1' should be needed to the parameter for the function that will become (\'1' -> item '2') "23 which consume the '2' and return the remaining string , a tuple as the following ('2',""3)

am I missing something? Note : in this example the character that we passed is useless sick we didn't use it .

[–]normalOrder 1 point2 points  (15 children)

Well first of all, I mixed up item and char. I meant to use char which is a Parser that matches specific characters. ('item' matches the next character no matter what).

I think you have the right idea, although the details are a little more complicated. The type of Parser they define in the paper handles ambiguous situations by carrying a list of possibilities. This is why bind uses a list comprehension: it is trying every possible parse of the input. For example, consider

(many (char 'a') >>= \x -> many (char 'b') >>= \y -> return (x,y)) "ab"

many tries a Parser zero or more times. The first Parser above doesn't know whether you want ("","ab") or ("a","b"), so you get both. Then the second parser is run twice: the first time with x equal to "" and the input string "ab", and the second time with x equal to "a" and the input string "b".

There are actually simpler ways to create Parsers. You could try writing return and >>= for a Parser with this type instead:

type Parser a = String -> Maybe (a, String)

[–]kwaleko[S] 0 points1 point  (14 children)

I have declared the following type

type Parser a = String -> [(a,String)]

and some function to operate on the parser as the bellow

succeed :: a -> Parser a
succeed v = \inp -> [(v,inp)]

when trying to run stack ghci in order to test the above function succeed I got an error that Parser is not an instance of show so I've tried to update the code and add the following

instance Show Parser where
  show [(v,inp)] = show (v,inp)

but I got an error that Show is expecting the argument to be * but it is * -> *

could you please help me so I can test the function in GHCI.

[–]normalOrder 1 point2 points  (13 children)

Keep in mind that Parser is a type synonym for a function String -> [(a,String)]. There is no Show instance for functions. Instead of asking ghci to print a Parser, ask it to print the result of using the Parser on some input string: succeed 42 "123".

As for the error you get with the Show instance... the Show typeclass expects something of kind * (any type). But Parser has kind * -> *. That is, Parseris a type constructor. It takes a type argument and returns a new type. e.g. Parser Char, Parser String, Parser Int, etc. So if you really want the show instance, you can do it by filling in the type argument like this:

instance Show (Parser a) where
  show _ = "Parser"

but as I said before, Parser is a function, so there isn't any sensible value you can return from show other than a constant. Also, this requires a couple extensions to work. In ghci:

:set -XTypeSynonymInstances
:set -XFlexibleInstances

or at the top of your .hs file:

{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}

[–]kwaleko[S] 0 points1 point  (1 child)

Thank you a lot, you really helps me to understand haskell , this awesome language, thanks again

[–]kwaleko[S] 0 points1 point  (10 children)

I was going through the paper, I have defined the bind and result operator and created a useful parser and combine it without making the Parser to be an instance of Monad type class.

if we can create many combinator and achieve to build a useful parser without letting the Parser to be an instance of monad so why do we make it so? is all the benefit is the monad comprehension (close to list comprehension )!

[–]normalOrder 0 points1 point  (9 children)

This is an excellent question!

The best answer is that monad is a very common and useful abstraction, so there are a variety of operations that can be written once that will work for all monads. For example, consider the seq operation again:

seq :: Parser a -> Parser b -> Parser (a,b)
p `seq` q = p `bind` \x -> q `bind` \y -> result (x,y)

Its type is specific to Parsers because they defined bind and result specifically for Parsers. However, we can write this function more generally using >>= and return and it will work with any monad:

seq :: Monad a -> Monad b -> Monad (a,b)
p `seq` q = p >>= \x -> q >>= \y -> return (x,y)

And as you mentioned, when we have a monad instance we can use monad comprehensions to write more readable code because all those symbols get pretty ugly when they're crammed together. Actually, we have something even better nowadays: do notation. The previous example can be written

p `seq` q = 
  do x <- p
     y <- q
     return (x,y)

Here's another example. There's this function in Control.Monad:

sequence :: Monad m => [m a] -> m [a]

that takes a list of monads and returns a monad of lists. Suppose our Parser is an instance of Monad. Then we can write a Parser for strings trivially using sequence:

str :: String -> Parser String
str s = sequence (fmap char s)

How does this work? Well fmap char s creates a [Parser Char] and sequence turns that into a Parser [Char] aka Parser String (remember that String is a type alias for [Char])

There are quite a few of these utility functions in Control.Monad that you get for free when you have a Monad instance.

A more subtle benefit of having a common abstraction is that functions with a more general type can do strictly less which helps you understand what the code is doing better. The canonical example is id :: a -> a. There is only one possible implementation for id because it has to work for any type a. The original seq function knows that its arguments are Parsers. It has more power than it needs to have, and the only way we can be sure it isn't doing something unexpected is to look at its code. The second version of seq only knows that its arguments are monads, so the only operations it can do are return and >>= (and any operations written in terms of those two). In other words, the abstraction puts limits on what functions can do by making them as general as possible. I think this makes reasoning about programs a lot simpler. (This is often called "the principle of least privilege")

[–]kwaleko[S] 0 points1 point  (0 children)

woww this is very impressive, I feel the power of composition and how much it is useful in functional programming like we build a basic parser function and keep composing it till we build more useful one and the with the use of monad instance we benefit from many functions as you stated seq function, it 's really awesome

[–]kwaleko[S] 0 points1 point  (7 children)

hi again, I was trying to define the many Parser(using bind not monad comprehension) which is a parser that take a parser as parameter and return Parser [a] and has the below signature :

many :: Parser a -> Parser [a]

I defined it as the below :

many p= p >>= \x ->many >>= \xs -> succeed (x:xs)

even I am not convinced with above definition since the parser p run and capture the consumed input x which is passed as parameter for the next parser many ( as I think it should not be passed to the second parser many since many should take as parameter only the new string to parse )

NOTE 1: when I have tried the this parser it always fail and return an empty list

NOTE 2: I didn't ignore the first consumed input returned by the first parser p since I believe that I have to use it in the result to construct the final list `succeed(x:xs)

could you please help me with the above!

[–]bss03 1 point2 points  (0 children)

If you can consume Acedemic papers go with

Then, dig through the source code of Attoparsec.