Here is a non-predictive recursive descent parser I wrote for the following grammar:
E : E1 '*' E | E1 '/' E | E1 '+' E | E1 '-' E | E1 ;E1 : NUM | '(' E ')' ;It can be run against the token stream
func main() { // (m) * (m / (m)) valid := Parse([]Token{ {Value: "("}, {Type: T_NUM}, {Value: ")"}, {Value: "*"}, {Value: "("}, {Type: T_NUM}, {Value: "/"}, {Value: "("}, {Type: T_NUM}, {Value: ")"}, {Value: ")"}, }) println(valid)}It doesn't build an AST, it simply validates a token stream against the language.
// an enumeration of named token types. for// this language, there is only one named// token type (the number). all other token// types are referred to by their string valueconst ( _ = iota T_NUM)// a token type refers to the terminal class of the// token (eg T_NUM), whereas the value refers to// its textual representation (eg "23")type Token struct { Type int Value string}// a slice of tokens that represents the input stack.// the base pointer is passed to the production func// that is matching against the stack.type TokenStack []Token// type definition for a production. given a stack and// a base pointer, return the position at which the// production ends. if no production is// matched return NoMatch, or -1.type Production func(TokenStack, int) int// returned by a production func if no match is found// on the token stack at a given base pointerconst NoMatch = -1// will return true if a given stack is valid for the// language. a parse is valid when the return value from// a production equals the length of the input stack// (meaning the entire stack matches the production)func Parse(s TokenStack) bool { p := E(s, 0) return M(p) && p == len(s)}// a production func for the rule://// E : E1 '*' E// | E1 '/' E// | E1 '+' E// | E1 '-' E// | E1// ;func E(s TokenStack, p int) int { if p1 := Consecutive(s, p, E1, V("*"), E); M(p1) { return p1 } else if p1 := Consecutive(s, p, E1, V("/"), E); M(p1) { return p1 } else if p1 := Consecutive(s, p, E1, V("+"), E); M(p1) { return p1 } else if p1 := Consecutive(s, p, E1, V("-"), E); M(p1) { return p1 } else if p1 = E1(s, p); M(p1) { return p1 } return NoMatch}// a production func for the rule://// E1 : T_NUM// | '(' E ')'// ;func E1(s TokenStack, p int) int { if p1 := T(T_NUM)(s, p); M(p1) { return p1 } else if p1 = Consecutive(s, p, V("("), E, V(")")); M(p1) { return p1 } return NoMatch}// will try to match consecutive productions against// a token stack, keeping track of the length of each// match. if every consecutive rule matches, return// the length of every match added to the base pointer// originally passed to this functionfunc Consecutive(s TokenStack, p int, rules ...Production) int { var match_len = 0 for _, rulei := range rules { if p1 := rulei(s, p+match_len); M(p1) { match_len += p1 - (p + match_len) } else { return NoMatch } } return p + match_len}// return a production func that will// match a given token valuefunc V(v string) Production { return func(s TokenStack, p int) int { if len(s) > p && string(s[p].Value) == v { return p + 1 } return NoMatch }}// return a production func that will// match a given token typefunc T(t int) Production { return func(s TokenStack, p int) int { if len(s) > p && s[p].Type == t { return p + 1 } return NoMatch }}// a function to conveniently check the// return value of a production for a matchfunc M(p int) bool { return p > NoMatch}1 Answer1
Go fmt
First off, alwaysgo fmt your source code. The best thing about Go is that it always looks the same.
Godoc
Follow the godoc standard for comments. When commenting a function, start with the function name, capitalized, and end with a dot. Use proper grammar (like capital letter after punctuation). This makesautomatically generated documentation look great, and further normalizeshow Go code looks.TokenStack represents the input stack. is a good first sentance to start with.
(Actually, maybeTokenStack should be namedInputStack, if that's what it really is?)
Const
Usingiota to assign a single value like you are doing looks wierd. It's not that clear what you are doing.
const T_NUM = 1or
const T_NUM = iota+1or actually use the first value; make the nil value useful:
const ( T_UNKNOWN = iota T_NUM)Usually, there is more than one constant. Since you're only using one, I'm wondering if it would be better to remove it as well. Every valid token is T_NUM, after all, so why have a constant for it in the first place?
Type
If you are going to use T_NUM as an enum, give it a type. Right now,T_NUM is just a constant integer. That means we can do weird things likeT_NUM + T_NUM * 5. If we define a new type, such astype TokenType int and then makeT_NUM a constantTokenType instead, we get a compililation error, since multiplying aTokenType by a number doesn't make sense. In many C-like languages, like C and C++, defining a new type like this wouldn't help, but in Go, types are real types, not type aliases.
Package main
You're starting the file withpackage main but there's nofunc main() in the source code. It looks like this is a library package, and in that case, the package name should be changed. The Go Blog has a great post aboutnaming packages and things in packages.
NoMatch
It feels likeNoMatch should be anerror (for example `var NoMatch = errors.New("could not match TokenStack to known rules"). Right now it looks like C-code.
Exporting everything
Everything in your package is exported. You probably want some of the functions or types to be package-local. I'd aim for only exporting a single function:Parse. That way, it's super simple for someone to get started with the package.V andT should definitely not be exported as they stand, but if they have to be exported, they should probably be renamed.
Packagetesting only gets away with exportingT because that's almost the only thing it's exporting, and everytime you use it, you writetesting.T. The name of the package helps explain whatT is.
In-bound errors
Some of your functions returnNoMatch on error, which is just-1. This is a common way of doing things in C or C++, but not in Go. It's usually better to return multiple values, the last one being anerror, so that the caller can know how to check for errors without having to read the documentation first. Maybe all negative return values are errors, or just-1. What happens if the library updates and adds another possible return value; do we get subtile bugs? Is 0 an error?atoi is a good example. In C, there's no way of knowing if it encountered an error (or if it just parsed the string "0"). In Go, it returns (int, error).
if else if else if else if
The Go switch statement is very powerful and can be used instead of long if/elseif/elseif/else statements. If you omit the argument toswitch it looks at the cases one by one until it finds one that is true, just like a long if/elseif/elseif statement does. In this case, however, since you need semicolons in your if-statements, switch does not work. Just a pointer for future code.
- \$\begingroup\$Thanks for your input, I appreciate it. The main points where I agree with you are doc conventions, unecessary exports, naming conventions, and the use of
gofmt(although this code has already been run throughgofmt). Though the if-else chain is important to make sure production funcs are not run after a match has been found. Also, I've seen plenty of functions from the go standard library return -1 as a cardinal value when searching for indexes in array-like data structures, so I still prefer that approach to multiple return values. Thanks again.\$\endgroup\$Miles Smith– Miles Smith2016-04-30 01:56:58 +00:00CommentedApr 30, 2016 at 1:56
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.