I'm very new to Haskell's ReadP library, and the entire concept of parser combinators, so I was wondering whether there are better ways to do some things in this program:
import Text.ParserCombinators.ReadPimport Control.Applicativeimport Data.Listdata Operator = Add | Subtractinstance Show Operator where show Add = "+" show Subtract = "-"instance Read Operator where readsPrec _ "+" = [(Add, "")] readsPrec _ "-" = [(Subtract, "")] readsPrec _ _ = []data Expression = Number Int | Infix { left :: Expression, op :: Operator, right :: Expression }instance Show Expression where show (Number x) = show x show (Infix left op right) = "(" ++ (show left) ++ " " ++ (show op) ++ " " ++ (show right) ++ ")"digit :: ReadP Chardigit = satisfy $ \char -> char >= '0' && char <= '9'number :: ReadP Expressionnumber = fmap (Number . read) (many1 digit)operator :: ReadP Operatoroperator = fmap read (string "+" <|> string "-")expression :: ReadP Expressionexpression = do skipSpaces left <- number skipSpaces op <- Control.Applicative.optional operator case op of Nothing -> return left Just op -> do skipSpaces right <- expression return (Infix left op right)parseExpression :: String -> Maybe ExpressionparseExpression input = case readP_to_S expression input of [] -> Nothing xs -> (Just . fst . last) xsThe main area where I'm looking for improvements is theexpression function, and specifically the things which seem improvable are the repeated calls toskipSpaces and the case expression to check whether an operator was parsed, but of course if you notice anything else that would be helpful too!
1 Answer1
First, a standard approach for dealing with theskipSpaces issue is to define a higher-order parser combinator, traditionally calledlexeme:
lexeme :: ReadP a -> ReadP alexeme p = p <* skipSpacesHere,lexeme takes a space-naive parserp, and converts it into a new parser that parses whateverp was planning to parse, and then reads and discards any trailing spaces. You uselexeme in the definitions of any of your parsers that would reasonably be assumed to read a complete "lexeme" and ignore any trailing space. For example,number should be alexeme parser:
number :: ReadP Expressionnumber = lexeme $ fmap (Number . read) (many1 digit)So shouldoperator, though obviously notdigit!expression won't need to uselexeme, because we'll arrange to end it with a lexeme parser.
It's also helpful to define asymbol parser which is essentiallystring that ignores trailing spaces:
symbol :: String -> ReadP Stringsymbol = lexeme . stringConsistently used,lexeme (andsymbol) will deal with all unwanted spaces, other than any leading spaces at the very start of your parse. If you have a non-recursive "top level" grammar production, like say a parser forprogram :: ReadP Program, then you would probably deal with them there. In your example, you don't have such a production (e.g.,expression is recursive), so you'd stick an extraskipSpaces inparseExpression. This is also a good place to puteof to make sure there isn't any trailing material that you aren't parsing:
parseExpression :: String -> Maybe ExpressionparseExpression input = case readP_to_S (skipSpaces *> expression <* eof) input of [] -> Nothing xs -> (Just . fst . last) xsSecond, your use of aRead instance for parsing your operators is very unusual. It would be more standard to make it a helper in theoperator parser, writing something like:
operator :: ReadP Operatoroperator = readSymbol <$> (symbol "+" <|> symbol "-") where readSymbol "+" = Add readSymbol "-" = Subtract(though an evenmore standard version is given below).
Third, inexpression, you can avoid thecase construct by using alternation(<|>) like so:
expression' :: ReadP Expressionexpression' = do left <- number (do op <- operator right <- expression return (Infix left op right) <|> return left)This would be the standard approach for non-parallel parser libraries (e.g., Parsec or Megaparsec). ForReadP, it's better to replace the(<|>) operator with the ReadP-specific(<++) operator to avoid also following the unwanted second parse in parallel. Beware that(<++) has higher precedence than(<|>), so some extra parentheses might be needed if it's being used in combination with other operators, as in the examples below.
Fourth, you've probably noticed my use of the applicative operators<* and*> and the alias<$> forfmap in the code above. It isvery common to use these -- plus the additional applicative operator<*> and sometimes the operators<**> or<$ -- in parsers. Once you get used to them, they tend to lead to less cluttered code.
For example, a more standard way of writingexpression would be:
expression' :: ReadP Expressionexpression' = Infix <$> number <*> operator <*> expression <|> numberor the slightly more efficient solution:
expression :: ReadP Expressionexpression = do left <- number (Infix left <$> operator <*> expression) <++ return leftNote that, in the context of parsers, an expression likef <$> p <*> q means "try to run the parserp, and then the parserq; assuming they both succeed, pass their return values tof". In other words, thatInfix expression is essentially:
Infix left op rightwhereop is the return value from the parseroperator andright is the return value from the parserexpression.
Similarly, the standard way of writingoperator is actually:
operator :: ReadP Operatoroperator = Add <$ symbol "+" <|> Subtract <$ symbol "-"This one requires an additional word of explanation. The operator<$ is kind of an odd duck. It's type signature is:
(<$) :: a -> f b -> f abut in the context of parsers specifically, the meaning ofx <$ p is "try to run the parserp; if it succeeds, ignore its return value and returnx". Basically, it's used to replace the return value of a parser that's used only for its success or failure and not its return value.
Note that these versions ofexpression, like your original version, treat the operators as right associative. This may be a problem if you're trying to parse "1-2-3" as equivalent to "(1-2)-3" instead of "1-(2-3)".
A few additional minor points:
isDigitis a more readable name for\c -> c >= '0' && c <= '9'munch1is more efficient thanmany1 (satisfy xxx), so I'd redefinenumberto use it- for testing, it's probably a good idea to have a
parseExpressionsfunction that looks atall the parses - for production, it's probably a good idea to check for ambiguous parses and do something about it, rather than (fairly arbitrarily) selecting the last parse in the list
With all of these suggestions implemented, the final version would look something like:
{-# OPTIONS_GHC -Wall #-}import Data.Char (isDigit)import Control.Applicativeimport Text.ParserCombinators.ReadP (eof, munch1, ReadP, readP_to_S, skipSpaces, string, (<++))data Operator = Add | Subtractdata Expression = Number Int | Infix { left :: Expression, op :: Operator, right :: Expression }lexeme :: ReadP a -> ReadP alexeme p = p <* skipSpacessymbol :: String -> ReadP Stringsymbol = lexeme . stringnumber :: ReadP Expressionnumber = lexeme $ Number . read <$> munch1 isDigitoperator :: ReadP Operatoroperator = Add <$ symbol "+" <|> Subtract <$ symbol "-"expression :: ReadP Expressionexpression = do x <- number (Infix x <$> operator <*> expression) <++ return xtop :: ReadP Expressiontop = skipSpaces *> expression <* eofparseExpressions :: String -> [(Expression, String)]parseExpressions = readP_to_S topparseExpression :: String -> Maybe ExpressionparseExpression input = case parseExpressions input of [] -> Nothing [(x,"")] -> Just x _ -> error "ambiguous parse"You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.
