Small combinator parsing library in Haskell

Question 1

I recently wrote code to parse lambda expressions written in a lisp like format, using a small hand-rolled parsing library.

The goal

This code was written as a part ofthis code-golf challenge not as code-golf, but as a demo for the concept. My goals for this were

It should be simple enough that by looking over it one should be able to understand what is being parsed.
It should be self-contained so that if someone wants to dig into the nitty-gritty and understand how it works they can.

I chose Haskell for this because:

In my experience imperative parsers are more difficult to understand than combinator parsers.
In my experience combinator parsing has a smaller footprint since it can assemble complex parsers from simple components.
In my experience it is a joy to write and read parsers in Haskell.

The code then consists of a minimalist "library" of simple helper functions, and the definition of the actual parser. If it is important, you should be able to see exactly how it is presented via the link above.

I'd like to get an idea of how well I achieved these goals and what steps could be taken to better present the idea through code.

Expressions

The goal is to verify that a string is a properly formatted lambda expression, consisting of

abs (abstraction) to introduce a new variable.
var (variable) to use a variable.
app (application) to apply one lambda to another.

formatted as a lisp list.

For example

(abs x (app (var x) (var x)))

Is a valid expression representing\$\lambda x. x x\$.

Additionally we want to verify the expression doesn't use undeclared variables and doesn't shadow any variables.

So

(abs y (var x))

is not valid becausex is not declared. And

(abs y (abs y (abs x (var x))))

is not valid becausey is shadowed.

A precise grammar for the valid expressions is given

\$S\left(C\right) := \left\{\begin{array}[rr] \\color{green}{\texttt{(var }}x\color{green}{\texttt{)}} & \textrm{ where } x\in C \\\color{green}{\texttt{(abs }} x\texttt{ }S(C\cup\{x\})\color{green}{\texttt{)}} & \textrm{ where } x\notin C\\\color{green}{\texttt{(app }} S(C)\texttt{ }S(C)\color{green}{\texttt{)}} &\end{array}\right.\$

Where valid expressions are strings spanned by this grammar starting from\$S(\{\})\$.

There are two extra things to note:

The whitespace is intentionally inflexible in this specification.
Our Parse is not producing a parse tree, only confirming whether an expression meets the specifications.

Library

import Control.Applicative  ( liftA2  , Alternative (..)  )import Control.Monad  ( guard  )newtype Parser a =  Parser (String -> [(String, a)])apply :: Parser a -> String -> [(String, a)]apply (Parser p) s =  p s instance Functor Parser where  fmap func (Parser p) =    Parser $ fmap (map (fmap func)) pinstance Applicative Parser where  pure x =    Parser ( \ string -> [(string, x)] )  liftA2 f (Parser p1) (Parser p2) =    Parser      ( \ string -> do        (rem1, res1) <- p1 string        (rem2, res2) <- p2 rem1        return (rem2, f res1 res2)      )instance Monad Parser where  Parser p >>= f =    Parser      ( \ string -> do        (rem1, res1) <- p string        apply (f res1) rem1      )instance Alternative Parser where  empty =    Parser ( \ _ -> [] )    Parser p1 <|> Parser p2 =    Parser $ ( \ string -> p1 string ++ p2 string )charBy :: (Char -> Bool) -> Parser CharcharBy predicate =  Parser go  where    go (a : b) =      [ (b, a) | predicate a ]    go [] =      []char :: Char -> Parser Charchar x =  charBy (x==)string :: String -> Parser Stringstring =  traverse charchoice :: [Parser a] -> Parser achoice =  foldr (<|>) empty

Parser

lambda :: [String] -> Parser ()lambda variables =  do    char '('    choice      [ do        string "abs "        name <- some $ charBy (`elem` ['a'..'z'])        guard $ notElem name variables        char ' '        lambda (name : variables)      , do        string "var "        choice (map string variables)        return ()      , do        string "app "        lambda variables        char ' '        lambda variables      ]    char ')'    return ()

Description

This uses monadic parsing to get the job done. I build a parser type implement the necessary type classes and a couple of helper functions. The actual parser follows the same structure as the given grammar.

The code does a single pass without any tokenizer. This might be a little odd, but because the domain I think that such would make the code more complex, and we have very rigid white-space rules.

Review

I welcome all feedback on how to make this code more presentable. However the specific questions I would like to prompt are:

Isdo notation a good choice given my goals?
Was it a good choice to use a single pass method? Might it be more presentable to parse into a tree and then verify certain properties of the tree?
Are there any subtle Haskell-isms that I have that could easily be removed to make the more approachable to a general audience?

of course keeping my initial goals in mind.

Question 2

Sorry I'm so late to this!

Isdo notation a good choice given my goals?

Sure. Anyone who knows monads will understanddo. On the other hand, don't assume everyone knows monadicparsers. A simplelink as a comment can go a long way. (Or you could actually comment your code, but that's more work!)

Was it a good choice to use a single pass method? Might it be more presentable to parse into a tree and then verify certain properties of the tree?

It's ok. Building an explicit tree would be more normal, because it's prerequisite to doinganything else besides just checking validity.

Are there any subtle Haskell-isms that I have that could easily be removed to make the more approachable to a general audience?

That's hard to say; I'm probably too familiar with Haskell to count as a general audience.

Add a few comments.
Maybe get rid ofinstance Alternative Parser. (Not that it isn't pulling its weight, it's just a thing you could maybe do without?)
Certainly figure out some other way of writing[ (b, a) | predicate a ]. Probably anyone canguess what it does, but a couple minutes of googling didn't find anything saying it's valid syntax; I had to try it in GHCI to make sure.

Other stuff:

Arerem andres supposed to be "remainder" and "resolution"? Longer variable names are sometimes easier for new readers.
On the flip side, in the>>= definition (and elsewhere),string could probably just bes; the advantage being that it doesn't share a name with the functionstring in the same module.
Your definition offmap is really hard to read. Usingmap to specify which level applies to the[] is ok; why not use. for the function layer? The functor instance of(,) never feels great. If you want to keep it concise, consider importingBifunctor so you can write
fmap f (Parser p) = Parser $ (map (second f)) . p.
If you want it to be easy for a stranger to read it'll probably need to be much more verbose.
The use ofdo inliftA2 is in the[] monad; note that with a comment.
empty = Parser (const [])
Don't name your parserlambda.

ShapeOfMatter 4,4477 silver badges25 bronze badges · Answer 1 · 2021-09-23 03:37:59Z

Sorry I'm so late to this!

Isdo notation a good choice given my goals?

Sure. Anyone who knows monads will understanddo. On the other hand, don't assume everyone knows monadicparsers. A simplelink as a comment can go a long way. (Or you could actually comment your code, but that's more work!)

Was it a good choice to use a single pass method? Might it be more presentable to parse into a tree and then verify certain properties of the tree?

It's ok. Building an explicit tree would be more normal, because it's prerequisite to doinganything else besides just checking validity.

Are there any subtle Haskell-isms that I have that could easily be removed to make the more approachable to a general audience?

That's hard to say; I'm probably too familiar with Haskell to count as a general audience.

Add a few comments.
Maybe get rid ofinstance Alternative Parser. (Not that it isn't pulling its weight, it's just a thing you could maybe do without?)
Certainly figure out some other way of writing[ (b, a) | predicate a ]. Probably anyone canguess what it does, but a couple minutes of googling didn't find anything saying it's valid syntax; I had to try it in GHCI to make sure.

Other stuff:

Arerem andres supposed to be "remainder" and "resolution"? Longer variable names are sometimes easier for new readers.
On the flip side, in the>>= definition (and elsewhere),string could probably just bes; the advantage being that it doesn't share a name with the functionstring in the same module.
Your definition offmap is really hard to read. Usingmap to specify which level applies to the[] is ok; why not use. for the function layer? The functor instance of(,) never feels great. If you want to keep it concise, consider importingBifunctor so you can write
fmap f (Parser p) = Parser $ (map (second f)) . p.
If you want it to be easy for a stranger to read it'll probably need to be much more verbose.
The use ofdo inliftA2 is in the[] monad; note that with a comment.
empty = Parser (const [])
Don't name your parserlambda.

Movatterモバイル変換

Stack Exchange Network

Small combinator parsing library in Haskell

The goal

Expressions

Library

Parser

Description

Review

1 Answer1

Other stuff:

You mustlog in to answer this question.

Related

Hot Network Questions

Subscribe to RSS