Incomputer science,recursive ascent parsing is a technique for implementing anLR parser which uses mutually-recursive functions rather than tables. Thus, the parser isdirectly encoded in the host language similar torecursive descent. Direct encoding usually yields a parser which is faster than its table-driven equivalent[1] for the same reason that compilation is faster than interpretation. It is also (nominally) possible to hand edit a recursive ascent parser, whereas a tabular implementation is nigh unreadable to the average human.
Recursive ascent was first described by Thomas Pennello in his articlePennello, Thomas J. (1986)."Very fast LR parsing".Proceedings of the 1986 SIGPLAN symposium on Compiler construction - SIGPLAN '86. pp. 145–151.doi:10.1145/12276.13326.ISBN 0897911970.S2CID 17214407. in 1986. He was not intending to create a hand-editable implementation of an LR parser, but rather a maintainable and efficient parser implemented inassembly language. The technique was later expounded upon by G.H. Roberts[2] in 1988 as well as in an article by Leermakers, Augusteijn, Kruseman Aretz[3] in 1992 in the journalTheoretical Computer Science. An extremely readable description of the technique was written by Morell and Middleton[4] in 2003. A good exposition can also be found in a TOPLAS article by Sperber and Thiemann.[5]
Recursive ascent has also been merged with recursive descent, yielding a technique known asrecursive ascent/descent. This implementation technique is arguably easier to hand-edit due to the reduction in states and fact that some of these states are more intuitively top-down rather than bottom up. It can also yield some minimal performance improvements over conventional recursive ascent.[6]
Intuitively, recursive ascent is a literal implementation of theLR parsing concept. Each function in the parser represents a single LRautomaton state. Within each function, a multi-branch statement is used to select the appropriate action based on the current token read from the input stream. Once the token has been identified, action is taken based on the state being encoded. There are two different fundamental actions which may be taken based on the token in question:
There is also a third LR automaton action which may be taken in a given state, but only after a reduce where the shift counter has decremented to zero (indicating that the current state should handle the result). This is thegoto action, which is essentially a special case ofshift designed to handle non-terminals in a production. This action must be handledafter the multi-branch statement, since this is where any reduction results will "resurface" from farther down the call stack.
Consider the following grammar inbison syntax:
expr : expr '+' term { $$ = $1 + $3; } | expr '-' term { $$ = $1 - $3; } | term { $$ = $1; } ;term : '(' expr ')' { $$ = $2; } | num { $$ = $1; } ;num : '0' { $$ = 0; } | '1' { $$ = 1; } ;This grammar isLR(0) in that it is left-recursive (in theexpr non-terminal) but does not require any lookahead. Recursive ascent is also capable of handling grammars which are LALR(1) in much the same way that table-driven parsers handle such cases (by pre-computing conflict resolutions based on possible lookahead).
The following is aScala implementation of a recursive ascent parser based on the above grammar:
objectExprParser{privatetypeResult=(NonTerminal,Int)privatesealedtraitNonTerminal{valv:Int}privatecaseclassNTexpr(v:Int,in:Stream[Char])extendsNonTerminalprivatecaseclassNTterm(v:Int,in:Stream[Char])extendsNonTerminalprivatecaseclassNTnum(v:Int,in:Stream[Char])extendsNonTerminalclassParseException(msg:String)extendsRuntimeException(msg){defthis()=this("")defthis(c:Char)=this(c.toString)}defparse(in:Stream[Char])=state0(in)._1.v/* * 0 $accept: . expr $end * * '(' shift, and go to state 1 * '0' shift, and go to state 2 * '1' shift, and go to state 3 * * expr go to state 4 * term go to state 5 * num go to state 6 */privatedefstate0(in:Stream[Char])=inmatch{casecur#::tail=>{defloop(tuple:Result):Result={val(res,goto)=tupleif(goto==0){loop(resmatch{caseNTexpr(v,in)=>state4(in,v)caseNTterm(v,in)=>state5(in,v)caseNTnum(v,in)=>state6(in,v)})}else(res,goto-1)}loop(curmatch{case'('=>state1(tail)case'0'=>state2(tail)case'1'=>state3(tail)casec=>thrownewParseException(c)})}caseStream()=>thrownewParseException}/* * 4 term: '(' . expr ')' * * '(' shift, and go to state 1 * '0' shift, and go to state 2 * '1' shift, and go to state 3 * * expr go to state 7 * term go to state 5 * num go to state 6 */privatedefstate1(in:Stream[Char]):Result=inmatch{casecur#::tail=>{defloop(tuple:Result):Result={val(res,goto)=tupleif(goto==0){loop(resmatch{caseNTexpr(v,in)=>state7(in,v)caseNTterm(v,in)=>state5(in,v)caseNTnum(v,in)=>state6(in,v)})}else(res,goto-1)}loop(curmatch{case'('=>state1(tail)case'0'=>state2(tail)case'1'=>state3(tail)casec=>thrownewParseException(c)})}caseStream()=>thrownewParseException}/* * 6 num: '0' . * * $default reduce using rule 6 (num) */privatedefstate2(in:Stream[Char])=(NTnum(0,in),0)/* * 7 num: '1' . * * $default reduce using rule 7 (num) */privatedefstate3(in:Stream[Char])=(NTnum(1,in),0)/* * 0 $accept: expr . $end * 1 expr: expr . '+' term * 2 | expr . '-' term * * $end shift, and go to state 8 * '+' shift, and go to state 9 * '-' shift, and go to state 10 */privatedefstate4(in:Stream[Char],arg1:Int):Result=inmatch{casecur#::tail=>{decrement(curmatch{case'+'=>state9(tail,arg1)case'-'=>state10(tail,arg1)casec=>thrownewParseException(c)})}caseStream()=>state8(arg1)}/* * 3 expr: term . * * $default reduce using rule 3 (expr) */privatedefstate5(in:Stream[Char],arg1:Int)=(NTexpr(arg1,in),0)/* * 5 term: num . * * $default reduce using rule 5 (term) */privatedefstate6(in:Stream[Char],arg1:Int)=(NTterm(arg1,in),0)/* * 1 expr: expr . '+' term * 2 | expr . '-' term * 4 term: '(' expr . ')' * * '+' shift, and go to state 9 * '-' shift, and go to state 10 * ')' shift, and go to state 11 */privatedefstate7(in:Stream[Char],arg1:Int):Result=inmatch{casecur#::tail=>{decrement(curmatch{case'+'=>state9(tail,arg1)case'-'=>state10(tail,arg1)case')'=>state11(tail,arg1)casec=>thrownewParseException(c)})}caseStream()=>thrownewParseException}/* * 0 $accept: expr $end . * * $default accept */privatedefstate8(arg1:Int)=(NTexpr(arg1,Stream()),1)/* * 1 expr: expr '+' . term * * '(' shift, and go to state 1 * '0' shift, and go to state 2 * '1' shift, and go to state 3 * * term go to state 12 * num go to state 6 */privatedefstate9(in:Stream[Char],arg1:Int)=inmatch{casecur#::tail=>{defloop(tuple:Result):Result={val(res,goto)=tupleif(goto==0){loop(resmatch{caseNTterm(v,in)=>state12(in,arg1,v)caseNTnum(v,in)=>state6(in,v)case_=>thrownewAssertionError})}else(res,goto-1)}loop(curmatch{case'('=>state1(tail)case'0'=>state2(tail)case'1'=>state3(tail)casec=>thrownewParseException(c)})}caseStream()=>thrownewParseException}/* * 2 expr: expr '-' . term * * '(' shift, and go to state 1 * '0' shift, and go to state 2 * '1' shift, and go to state 3 * * term go to state 13 * num go to state 6 */privatedefstate10(in:Stream[Char],arg1:Int)=inmatch{casecur#::tail=>{defloop(tuple:Result):Result={val(res,goto)=tupleif(goto==0){loop(resmatch{caseNTterm(v,in)=>state13(in,arg1,v)caseNTnum(v,in)=>state6(in,v)case_=>thrownewAssertionError})}else(res,goto-1)}loop(curmatch{case'('=>state1(tail)case'0'=>state2(tail)case'1'=>state3(tail)casec=>thrownewParseException(c)})}caseStream()=>thrownewParseException}/* * 4 term: '(' expr ')' . * * $default reduce using rule 4 (term) */privatedefstate11(in:Stream[Char],arg1:Int)=(NTterm(arg1,in),2)/* * 1 expr: expr '+' term . * * $default reduce using rule 1 (expr) */privatedefstate12(in:Stream[Char],arg1:Int,arg2:Int)=(NTexpr(arg1+arg2,in),2)/* * 2 expr: expr '-' term . * * $default reduce using rule 2 (expr) */privatedefstate13(in:Stream[Char],arg1:Int,arg2:Int)=(NTexpr(arg1-arg2,in),2)privatedefdecrement(tuple:Result)={val(res,goto)=tupleassert(goto!=0)(res,goto-1)}}
The following is aProlog implementation of a recursive ascent parser based on the above grammar:
state(S),[S]-->[S].state(S0,S),[S]-->[S0]./* 0. S --> E$ 1. E --> E + T 2. E --> E - T 3. E --> T 4. T --> (E) 5. T --> N 6. N --> 0 7. N --> 1*/accept-->state(s([],[e(_)])).r(1)-->state(s(Ts,[t(A1),'+',e(A0)|Ss]),s(Ts,[e(A0+A1)|Ss])).r(2)-->state(s(Ts,[t(A1),'-',e(A0)|Ss]),s(Ts,[e(A0-A1)|Ss])).r(3)-->state(s(Ts,[t(A)|Ss]),s(Ts,[e(A)|Ss])).r(4)-->state(s(Ts,[')',e(A),'('|Ss]),s(Ts,[t(A)|Ss])).r(5)-->state(s(Ts,[n(A)|Ss]),s(Ts,[t(A)|Ss])).r(6)-->state(s(Ts,['0'|Ss]),s(Ts,[n(0)|Ss])).r(7)-->state(s(Ts,['1'|Ss]),s(Ts,[n(1)|Ss])).t(T)-->state(s([T|Ts],Ss),s(Ts,[T|Ss]))./* S --> .E$ E --> .E + T E --> .E - T E --> .T T --> .(E) T --> .N N --> .0 N --> .1*/s0-->t('('),s3,s2,s1.s0-->t('0'),s11,s10,s2,s1.s0-->t('1'),s12,s10,s2,s1./* S --> E.$ E --> E. + T E --> E. - T*/s1-->accept.s1-->t('+'),s7,s1.s1-->t('-'),s8,s1./* E --> T.*/s2-->r(3)./* T --> (.E) E --> .E + T E --> .E - T E --> .T T --> .(E) T --> .N N --> .0 N --> .1*/s3-->t('('),s3,s2,s4.s3-->t('0'),s11,s10,s2,s4.s3-->t('1'),s12,s10,s2,s4./* T --> (E.) E --> E .+ T E --> E .- T*/s4-->t(')'),s9.s4-->t('+'),s7,s4.s4-->t('-'),s8,s4./* E --> E + T.*/s5-->r(1)./* E --> E - T.*/s6-->r(2)./* E --> E + .T T --> .(E) T --> .N N --> .0 N --> .1*/s7-->t('('),s3,s5.s7-->t('0'),s11,s10,s5.s7-->t('1'),s12,s10,s5./* E --> E - .T T --> .(E) T --> .N N --> .0 N --> .1*/s8-->t('('),s3,s6.s8-->t('0'),s11,s10,s6.s8-->t('1'),s12,s10,s6./* T --> (E).*/s9-->r(4)./* T --> N.*/s10-->r(5)./* N --> '0'.*/s11-->r(6)./* N --> '1'.*/s12-->r(7).parser(Cs,T):-length(Cs,_),phrase(s0,[s(Cs,[])],[s([],[e(T)])]).% state(S0, S), [S] --> [S0, S].%?- S0 = [s("(1+1)", [])|_], phrase(s0, S0, [S]), maplist(portray_clause, S0).
{{cite journal}}: CS1 maint: multiple names: authors list (link)