March 2019
Some real world parsers as Parsec, let us choose between applicative and monadic parsing. The applicative style should be sufficient to parse context-free languages and is easier to reason about, but it is not capable of parsing context-sensitive grammars.
Before we discuss Applicative and Monadic Parsing, it is important to understand what Functors, Applicative Functors, Alternatives and Monads have to offer. When in doubt simply look up the type-class definition.
Two well-known data types which are instances of all of the above are Maybe and List. Try it yourself in GHCi.
-- Maybe as Functor
fmap (+20) (Just 10)
fmap (+20) (Nothing)
-- List as Functor
fmap (+20) [1..10]
-- Maybe as Applicative Functor
Nothing <*> Just (22)
Just (+1) <*> Nothing
Just (+15) <*> Just (22)
-- List as Applicative Functor
[]<*>[1..5]
[(+1),(+15)]<*>[]
[(+1),(+15)]<*>[1..5]
-- Maybe as Alternative
Nothing <|> Just 5
Just 5 <|> Nothing
Just 22 <|> Just 10
-- List as Alternative
[1..5] <|> [5..10]
-- Maybe as Monad
Just 5 >>= Just . (+10)
Just 5 >>= return . (+10)
Just 10 >>= return . (+5) >>=return . (+10)
Nothing >>= return . (+5) >>=return . (+10)
Just 10 >>= \x-> return (x*2) >>=return . (+x`div`2)
-- List as Monad
[1..10] >>= (:[]) . (*5)
[1..5] >>= return . (^2)
[[1..5],[5..10],[],[12]]>>=id
[[1..5],[5..10],[],[12]]>>= map (*4)
[1..5] >>= \a -> [100..100+a] >>= \b -> return (a+b)
Our simple toy Parser simply encapsulates a parsing function that returns Nothing if parsing fails. Otherwise a pair consisting of the parsed and unparsed input inside its respective elements.
Notice that we made our Parser instance of all the type-classes already.
data Parser a b = Parser (a->Maybe (b,a))
instance Functor (Parser a) where
fmap f p = Parser g
where g inp = case parse p inp of Just (x,y) -> Just (f x,y)
_ -> Nothing
instance Applicative (Parser a) where
pure v = Parser (\inp -> Just (v,inp))
pg <*> px = Parser g
where g inp = case parse pg inp of Just (f,b) -> parse (fmap f px) b
_ -> Nothing
instance Alternative (Parser a) where
empty = Parser (\_ -> Nothing)
p1 <|> p2 = Parser g
where g inp = case parse p1 inp of Nothing -> parse p2 inp
x -> x
instance Monad (Parser a) where
return = pure
p >>= f = Parser g
where g inp = case parse p inp of Just (a,b) -> parse (f a) b
_ -> Nothing
We also need at least a minimal set of primitives to parse something. Note that the primitives can give you the effects of Applicative or Alternative even if the Parser itself is not an instance of these. (Thanks to dmwit for pointing this out on #haskell).
The num parser makes use of the some combinator which requires our Parser to be an instance of Alternative as well. Applicative alone is of limited use in practice since without it, we also lack the <|> combinator.
satisfy :: (Char -> Bool) -> Parser String Char
satisfy f = Parser g where g (x:xs) | f x = Just (x,xs)
g _ = Nothing
space :: Parser String Char
space = satisfy (==' ')
char :: Char -> Parser String Char
char c = satisfy (==c)
notChar :: Char -> Parser String Char
notChar c = satisfy (/=c)
num :: Parser String Int
num = read <$> some (satisfy isDigit)
word :: String -> Parser String String
word (c:cs) = (:) <$> char c <*> word cs
word _ = pure ""
Now we are able to parse any context free grammar, given our satisfy Parser along with Applicative, Alternative, and recursion.
Let’s parse something!
-- we parse, by simply applying the function encapsulated in our Parser on the input.
parse (Parser f) inp = f inp
-- well-formed parentheses are the canonical example of a context-free grammar
parsePar = concat <$> some parseP where parseP = word "()" <|> (char '(' *> parsePar <* char ')')
parse parsePar "((()())()())"
-- a simple context-free language with matching pairs of a's and b's, which is not regular
parseAB = (\a x b -> a:x++"b") <$> char 'a' <*> parseAB <*> char 'b' <|> word "ab"
parse parseAB "aaaaabbbbb"
-- parse basic mathematical operations and calulate the results.
parseOp op sig = (pure (op) <* many space <*> num <* many space <* char sig <* many space <*> num )
parse (parseOp (+) '+' <|> parseOp (-) '-' <|> parseOp (*) '*' <|> parseOp (div) '/') " 111 * 747 "
parse (parseOp (+) '+' <|> parseOp (-) '-' <|> parseOp (*) '*' <|> parseOp (div) '/') "111+747"
The canonical non-context-free language can not be caputred by applicative parsing anymore:
$$\{a^n b^n c^n : n \geqslant 1\}$$
The following monadic parser will work here:
parseABC = length <$> (many $ char 'a') >>= \la ->
length <$> (many $ char 'b') >>= \lb ->
length <$> (many $ char 'c') >>= \lc ->
if la==lb&&lb==lc then pure ("ABCok") else empty
parse parseABC "aaaaabbbbbccccc"
Here you can find the complete source code of this little toy parser along with the examples presented in this article:
https://gitweb.softwarefools.com/?p=miguel/haskell.git;a=blob;f=parser/main.hs
As usual, use the simplest tool that will suffice 😄