Usually, parser combinators in Haskell based on the parser type like
newtype Parser t r = Parser ([t] -> [(r,[t])])This allows back tracking and multiple results. However I think it would be good to have it in "push" style, means the parser will have a set of result when no more input given, and have a transition function that for a given token, results in a new state of the parser. This results in the defintion
newtype PushParser t r = PushParser ([r], Maybe (t -> PushParser t r))(We could have a data constructor that have two fields, however I prefer to use the more efficientnewtype keyword so have to have only one field as a tuple)
To parse a list of tokens, we define theparse function as following:
parse :: PushParser t r -> [t] -> [r]--No input, rs is the resultparse ~(PushParser (rs, _)) [] = rs--No parsing function, this is an error state. No resultparse (PushParser (_,Nothing)) _ = []--Otherwise, drop the results, apply the pasing function with the input and continueparse (PushParser (_,Just f)) (t:ts) = parse (f t) tsWe can then define afmap to make it aFunctor like the following:
instance Functor (PushParser t) where fmap f ~(PushParser (r,mg)) = PushParser ((map f r), do {g<-mg; return (fmap f.g)})Instance ofApplicative is more tricky, but the easiest way is to make it depend on the not yet implemented instance ofMonad.
instance Applicative (PushParser t) where pure x = PushParser ([x],Nothing) pf <*> pv = do f <- pf v <- pv return (f v)But before giving the definition ofMonad I want to introduce the definition ofAlternative:
instance Alternative (PushParser t) where empty = PushParser ([],Nothing) (PushParser (r1,Nothing)) <|> (PushParser (r2,f)) = PushParser ((r1++r2), f) (PushParser (r1,f)) <|> (PushParser (r2,Nothing)) = PushParser ((r1++r2), f) (PushParser (r1,Just f1)) <|> (PushParser (r2,Just f2)) = PushParser ((r1 ++ r2), Just (\t -> f1 t <|> f2 t))I have tried out different ways for the instance ofMonad, but finally I believe the best way to give definition of(>>=) is to defineunwrap (orjoin in the monad library) first.
instance Monad (PushParser t) where p >>= f = unwrap (fmap f p) where unwrap ~(PushParser (prs, mf)) = foldr (<|>) (PushParser ([], do { f<-mf; return (unwrap.f)})) prsThe above works perfectally fine, and although it does not make sense to make it an instance ofParser (inText.Parser.Combinators) as we don't allow trace back, the other combinators that only requiresAlternative worked perfectally fine.
However, the performance seems to be too bad. It is possibly as it keeps all possible parsing path and so consumes too much memory. For example, a simpleparse (many (some (char '.'))) "...." will return 8 results, with all combinations of dots.
But I think there should be some way to improve the performance. Please advise me how to improve on this.
- \$\begingroup\$Did you mean to write
newtype PushParser t r = PushParser ([r], Maybe (t -> PushParser t r))?\$\endgroup\$gallais– gallais2016-12-16 13:29:23 +00:00CommentedDec 16, 2016 at 13:29 - 1\$\begingroup\$But when the second argument of
parseis[], how can you return the[t]from the first argument as an[r]? Also tuples can't possibly be faster thandata-constructed types:Tuples are defined usingdata\$\endgroup\$Gurkenglas– Gurkenglas2016-12-17 00:05:14 +00:00CommentedDec 17, 2016 at 0:05 - \$\begingroup\$That's correct. Fixed.\$\endgroup\$Earth Engine– Earth Engine2016-12-17 06:31:35 +00:00CommentedDec 17, 2016 at 6:31
1 Answer1
It'll lazily take no more memory than it takes time to find the first result if you do not try to enumerate all possible results.
Formany/some to only try one option, you wantMaybe'sAlternative/MonadPlus behavior, not[]'s. You could parametrize your Parser type in theAlternative used so the user can change it on the fly.
yoctoparsec uses this "push" style.
parseString (hoistFreeT maybeToList $ many $ some $ mfilter (=='.') token) "...."You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.