Parser Combinators are the amazingly elegant way to write parsers that has evolved over on the functional programming side of the world, but have been less available or accessible for imperative languages. I have attempted somewhat to bridge this gap bystealing copying as many ideas from FP as were needed to make these combinators in C.
I've written anoverview of Parser Combinators and presentations of thebasic lisp-like object system and thehigher order functions which supplement the basic objects. Of coursewikipedia has a nice article on the topic, and my code mostly follows the excellent paper byHutton and Meijer which is also a good introduction.
This code is the result of 16 (or so) re-writes of the code in myprevious question. I anticipate at least one more re-write to add more features like checking the offside rule both for parsing syntaxes lke Python which use this rule, or just checking that indentation in C code is reasonable.
A few remarks about the code overall: I'm trying the "McIllroy convention" which places header guards around the#include lines rather than inside the included files. Theobject type is a pointer to a union. Most of the.c files have a (name-mangled)main() function which does some simple tests; somewhere along the continuum betweenreal unit testing andnot testing.
I'm looking for review of the "weird stuff" if possible. Particularly the handling ofsuspensions, which represent values to be lazily evaluated. But of course anything weird or suspicious is fair game.
ppnarg.h
Written by Laurent Deniau for theC Object System. I have extended the maximum number of arguments a little. This macro supports the variadic combinatorsPLUS andSEQ which combine many subparsers together. It lets you pass a count of the variadic arguments into the function that processes them.
/* * The PP_NARG macro evaluates to the number of arguments that have been * passed to it. * * Laurent Deniau, "__VA_NARG__," 17 January 2006, <comp.std.c> (29 November 2007). */#define PP_NARG(...) PP_NARG_(__VA_ARGS__,PP_RSEQ_N())#define PP_NARG_(...) PP_ARG_N(__VA_ARGS__)#define PP_ARG_N( \ _1, _2, _3, _4, _5, _6, _7, _8, _9,_10, \ _11,_12,_13,_14,_15,_16,_17,_18,_19,_20, \ _21,_22,_23,_24,_25,_26,_27,_28,_29,_30, \ _31,_32,_33,_34,_35,_36,_37,_38,_39,_40, \ _41,_42,_43,_44,_45,_46,_47,_48,_49,_50, \ _51,_52,_53,_54,_55,_56,_57,_58,_59,_60, \ _61,_62,_63,_64,_65,_66,_67,_68,_69,_70, \ _71,N,...) N#define PP_RSEQ_N() \ 71,70, \ 69,68,67,66,65,64,63,62,61,60, \ 59,58,57,56,55,54,53,52,51,50, \ 49,48,47,46,45,44,43,42,41,40, \ 39,38,37,36,35,34,33,32,31,30, \ 29,28,27,26,25,24,23,22,21,20, \ 19,18,17,16,15,14,13,12,11,10, \ 9,8,7,6,5,4,3,2,1,0pc9obj.h
Public interface to the basic objects. All the constructors for various types of objects and lists. Especially,chars_from_string which produces a lazy list of characters. The symbol-typed objects are constructed with a unique numeric identifer and a print string. These unique ids are generated by declaring the names in an enum. The odd nameSYM1 is used to start the next enum in the nextlayer to add more symbols keeping all ids unique.
#define PC9OBJ_H#include <stdlib.h>#include <stdio.h>#define POINTER_TO *typedef union uobject POINTER_TO object;typedef object list;typedef object parser;typedef object oper;typedef oper predicate;typedef object boolean;typedef object fSuspension( object );typedef list fParser( object, list );typedef object fOperator( object, object );typedef boolean fPredicate( object, object );typedef object fBinOper( object, object );enum object_symbols { T, F, X, A, B, SYM1};object T_, NIL_;int valid( object a );object Int( int i );list one( object a );list cons( object a, object b );object Suspension( object v, fSuspension *f );parser Parser( object v, fParser *f );oper Operator( object v, fOperator *f );object String( char *s, int disposable );object Symbol_( int sym, char *pname );#define Symbol(n) Symbol_( n, #n )object Void( void *v );void add_global_root( object a );int garbage_collect( object local_roots );object x_( list a );object xs_( list a );list take( int n, list o );list drop( int n, list o );list chars_from_string( char *v );list chars_from_file( FILE *v );object string_from_chars( list o );void print( object o );void print_list( list a );void print_flat( list a );void print_data( list a );#define PRINT_WRAPPER(_, __, ___) printf( "%s: %s %s= ", __func__, #__, ___ ), _( __ ), puts("")#define PRINT(__) PRINT_WRAPPER( print_list, __, "" )#define PRINT_FLAT(__) PRINT_WRAPPER( print_flat, __, "flat" )#define PRINT_DATA(__) PRINT_WRAPPER( print_data, __, "data" )pc9objpriv.h
Private interface to the basic objects. Objects are represented as a pointer to a tagged union. Theat_ function forces execution of suspensions, but externally this action must be performed by callingtake ordrop.
#define PC9OBJPRIV_H#ifndef PC9OBJ_H #include "pc9obj.h"#endiftypedef enum object_tag { INVALID, INTEGER, LIST, SUSPENSION, PARSER, OPERATOR, SYMBOL, STRING, VOID,} tag;union uobject { tag t; struct { tag t; int i; } Int; struct { tag t; object a, b; } List; struct { tag t; object v; fSuspension *f; } Suspension; struct { tag t; object v; fParser *f; } Parser; struct { tag t; object v; fOperator *f; } Operator; struct { tag t; int symbol; char *pname; object data; } Symbol; struct { tag t; char *string; int disposable; } String; struct { tag t; object next; } Header; struct { tag t; void *v; } Void;};object new_( object a );#define OBJECT(...) new_( (union uobject[]){{ __VA_ARGS__ }} )object at_( object a );object fill_string( char **s, list o );int obj_main( void );pc9obj.c
Implementation of basic objects. Objects are allocated as two structs side by side with the hidden left object used as an allocation record. The allocation records form a singly linked list which is traversed during a sweep of the garbage collector.x_ andxs_ are the famous lispcar andcdr functions, but I like the haskell naming convention so I used those; butx is too useful as a local variable name so they got underscores appended.
#include <stdio.h>#include "pc9objpriv.h"static void mark_objects( list a );static int sweep_objects( list *po );object T_ = (union uobject[]){{ .Symbol = { SYMBOL, T, "T" } }}, NIL_ = (union uobject[]){{ .t = INVALID }};static list global_roots = NULL;static list allocation_list = NULL;objectnew_( object a ){ object p = calloc( 2, sizeof *p ); return p ? p[0] = (union uobject){ .Header = { 0, allocation_list } }, allocation_list = p, p[1] = *a, &p[1] : 0;}intvalid( object a ){ switch( a ? a->t : 0 ){ default: return 0; case INTEGER: case LIST: case SUSPENSION: case PARSER: case OPERATOR: case SYMBOL: case STRING: return 1; }}objectInt( int i ){ return OBJECT( .Int = { INTEGER, i } );}listone( object a ){ return cons( a, NIL_ );}listcons( object a, object b ){ return OBJECT( .List = { LIST, a, b } );}objectSuspension( object v, fSuspension *f ){ return OBJECT( .Suspension = { SUSPENSION, v, f } );}parserParser( object v, fParser *f ){ return OBJECT( .Parser = { PARSER, v, f } );}operOperator( object v, fOperator *f ){ return OBJECT( .Operator = { OPERATOR, v, f } );}objectString( char *s, int disposable ){ return OBJECT( .String = { STRING, s, disposable } );}objectSymbol_( int sym, char *pname ){ return OBJECT( .Symbol = { SYMBOL, sym, pname } );}objectVoid( void *v ){ return OBJECT( .Void = { VOID, v } );}voidadd_global_root( object a ){ global_roots = cons( a, global_roots );}intgarbage_collect( object local_roots ){ mark_objects( local_roots ); mark_objects( global_roots ); return sweep_objects( &allocation_list );}static tag *mark( object a ){ return &a[-1].Header.t;}static voidmark_objects( list a ){ if( !valid(a) || *mark( a ) ) return; *mark( a ) = 1; switch( a->t ){ case LIST: mark_objects( a->List.a ); mark_objects( a->List.b ); break; case PARSER: mark_objects( a->Parser.v ); break; case OPERATOR: mark_objects( a->Operator.v ); break; case SYMBOL: mark_objects( a->Symbol.data ); break; case SUSPENSION: mark_objects( a->Suspension.v ); break; }}static intsweep_objects( list *po ){ int count = 0; while( *po ) if( (*po)->t ){ (*po)->t = 0; po = &(*po)->Header.next; } else { object z = *po; *po = (*po)->Header.next; if( z[1].t == STRING && z[1].String.disposable ) free( z[1].String.string ); free( z ); ++count; } return count;}objectat_( object a ){ return valid( a ) && a->t == SUSPENSION ? at_( a->Suspension.f( a->Suspension.v ) ) : a;}objectpx_( object v ){ list a = v; *a = *at_( a ); return x_( a );}objectx_( list a ){ return valid( a ) ? a->t == LIST ? a->List.a : a->t == SUSPENSION ? Suspension( a, px_ ) : NIL_ : NIL_;}objectpxs_( object v ){ list a = v; *a = *at_( a ); return xs_( a );}objectxs_( list a ){ return valid( a ) ? a->t == LIST ? a->List.b : a->t == SUSPENSION ? Suspension( a, pxs_ ) : NIL_ : NIL_;}listtake( int n, list o ){ if( n == 0 ) return NIL_; *o = *at_( o ); return valid( o ) ? cons( x_( o ), take( n-1, xs_( o ) ) ) : NIL_;}listdrop( int n, list o ){ if( n == 0 ) return o; *o = *at_( o ); return valid( o ) ? drop( n-1, xs_( o ) ) : NIL_;}listpchars_from_string( object v ){ char *p = v->String.string; return *p ? cons( Int( *p ), Suspension( String( p+1, 0 ), pchars_from_string ) ) : Symbol(EOF);}listchars_from_string( char *p ){ return p ? Suspension( String( p, 0 ), pchars_from_string ) : NIL_;}listpchars_from_file( object v ){ FILE *f = v->Void.v; int c = fgetc( f ); return c != EOF ? cons( Int( c ), Suspension( v, pchars_from_file ) ) : Symbol(EOF);}listchars_from_file( FILE *f ){ return f ? Suspension( Void( f ), pchars_from_file ) : NIL_;}static intcount_ints( list o ){ return !o ? 0 : o->t == SUSPENSION ? *o = *at_( o ), count_ints( o ) : o->t == INTEGER ? 1 : o->t == LIST ? count_ints( o->List.a ) + count_ints( o->List.b ) : 0;}objectfill_string( char **s, list o ){ return !o ? NULL : o->t == INTEGER ? *(*s)++ = o->Int.i, NULL : o->t == LIST ? fill_string( s, o->List.a ), fill_string( s, o->List.b ) : NULL;}objectstring_from_chars( list o ){ char *s = calloc( count_ints( o ) + 1, 1 ); object z = String( s, 1 ); return fill_string( &s, o ), z;}voidprint( object o ){ if( !o ){ printf( "() " ); return; } switch( o->t ){ case INTEGER: printf( "%d ", o->Int.i ); break; case LIST: printf( "(" ); print( o->List.a ); print( o->List.b ); printf( ") " ); break; case SUSPENSION: printf( "... " ); break; case PARSER: printf( "Parser " ); break; case OPERATOR: printf( "Oper " ); break; case STRING: printf( "\"%s\"", o->String.string ); break; case SYMBOL: printf( "%s ", o->Symbol.pname ); break; case INVALID: printf( "_ " ); break; default: printf( "INVALID " ); break; }}voidprint_listn( list a ){ switch( a ? a->t : 0 ){ default: print( a ); return; case LIST: print_list( x_( a ) ), print_listn( xs_( a ) ); return; }}voidprint_list( list a ){ switch( a ? a->t : 0 ){ default: print( a ); return; case LIST: printf( "(" ), print_list( x_( a ) ), print_listn( xs_( a ) ), printf( ")" ); return; }}voidprint_flat( list a ){ if( !a ) return; if( a->t != LIST ){ print( a ); return; } print_flat( a->List.a ); print_flat( a->List.b );}voidprint_data( list a ){ if( !a ) return; switch( a->t ){ case LIST: print_data( a->List.a), print_data( a->List.b ); break; case STRING: printf( "%s", a->String.string ); break; case SYMBOL: print_data( a->Symbol.data ); break; }}inttest_basics(){ list ch = chars_from_string( "abcdef" ); PRINT( ch ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( take( 1, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( x_( ch ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( x_( xs_( ch ) ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( take( 1, x_( xs_( ch ) ) ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( take( 5, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( ch ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( take( 6, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( take( 1, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( take( 2, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( take( 2, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( take( 2, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); PRINT( take( 2, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); return 0;}int obj_main(){ return test_basics(); }pc9fp.h
Interface to the "functional programming" functions. These have all been tweaked to deal nicely (?) with suspensions and force only enough execution to make progress on a computation.
#define PC9FP_H#ifndef PC9OBJ_H #include "pc9obj.h"#endifboolean eq( object a, object b );list env( list tail, int n, ... );object assoc( object a, list b );list copy( list a );list append( list a, list b );object apply( oper f, object o );list map( oper f, list o );list join( list o );object collapse( fBinOper *f, list o );object reduce( fBinOper *f, int n, object *po );pc9fp.c
Implementation of the "functional programming" functions.
#include <stdarg.h>#include <string.h>#include "pc9fp.h"#include "pc9objpriv.h"booleaneq( object a, object b ){ return ( !valid( a ) && !valid( b ) ? 1 : !valid( a ) || !valid( b ) ? 0 : a->t != b->t ? 0 : a->t == SYMBOL ? a->Symbol.symbol == b->Symbol.symbol : !memcmp( a, b, sizeof *a ) ? 1 : 0 ) ? T_ : NIL_;}listcopy( list a ){ return !valid( a ) ? NIL_ : a->t == LIST ? cons( copy( x_( a ) ), copy( xs_( a ) ) ) : a;}listenv( list tail, int n, ... ){ va_list v; va_start( v, n ); list r = tail; while( n-- ){ object a = va_arg( v, object ); object b = va_arg( v, object ); r = cons( cons( a, b ), r ); } va_end( v ); return r;}objectassoc( object a, list b ){ return !valid( b ) ? NIL_ : valid( eq( a, x_( x_( b ) ) ) ) ? xs_( x_( b ) ) : assoc( a, xs_( b ) );}static listpappend( object v ){ list a = assoc( Symbol(A), v ); list b = assoc( Symbol(B), v ); *a = *at_( a ); return append( a, b );}listappend( list a, list b ){ return !valid( a ) ? b : a->t == SUSPENSION ? Suspension( env( 0, 2, Symbol(A), a, Symbol(B), b ), pappend ) : cons( x_( a ), append( xs_( a ), b ) );}static objectpapply( object v ){ oper f = assoc( Symbol(F), v ); object o = assoc( Symbol(X), v ); *o = *at_( o ); return valid( o ) ? f->Operator.f( f->Operator.v, o ) : NIL_;}objectapply( oper f, object o ){ return f->t == OPERATOR ? valid( o ) ? o->t == SUSPENSION ? Suspension( env( 0, 2, Symbol(F), f, Symbol(X), o ), papply ) : f->Operator.f( f->Operator.v, o ) : f->Operator.f( f->Operator.v, o ) // for using( maybe(), ... ) : NIL_; //return f->t == OPERATOR ? f->Operator.f( f->Operator.v, o ) : NIL_;}static listpmap( object v ){ oper f = assoc( Symbol(F), v ); list o = assoc( Symbol(X), v ); *o = *at_( o ); return valid( o ) ? cons( apply( f, x_( o ) ), map( f, xs_( o ) ) ) : NIL_;}listmap( oper f, list o ){ return valid( o ) ? o->t == SUSPENSION ? Suspension( env( 0, 2, Symbol(F), f, Symbol(X), o ), pmap ) : cons( apply( f, x_( o ) ), Suspension( env( 0, 2, Symbol(F), f, Symbol(X), xs_( o ) ), pmap ) ) : NIL_; //return valid( o ) ? cons( apply( f, x_( o ) ), map( f, xs_( o ) ) ) : NIL_;}static listpjoin( object v ){ list o = assoc( Symbol(X), v ); *o = *at_( o ); return append( x_( take( 1, o ) ), join( xs_( o ) ) );}listjoin( list o ){ return valid( o ) ? o->t == SUSPENSION ? Suspension( env( 0, 1, Symbol(X), o ), pjoin ) : append( x_( o ), Suspension( env( 0, 1, Symbol(X), xs_( o ) ), pjoin ) ) : NIL_; //return valid( o ) ? append( x_( o ), join( xs_( o ) ) ) : NIL_;}static objectdo_collapse( fBinOper *f, object a, object b ){ return valid( b ) ? f( a, b ) : a;}objectcollapse( fBinOper *f, list o ){ return valid( o ) ? o->t == LIST ? do_collapse( f, collapse( f, x_( o ) ), collapse( f, xs_( o ) ) ) : o : NIL_;}objectreduce( fBinOper *f, int n, object *po ){ return n==1 ? *po : f( *po, reduce( f, n-1, po+1 ) );}pc9par.h
Interface to the Parser Combinators. Constructing parsers for single charactersalphadigitsat or combining parsers togetherseqplus. Construct a parser using aregex.
#define PC9PAR_H#ifndef PC9FP_H #include "pc9fp.h"#endif#include "ppnarg.h"enum parser_symbols { VALUE = SYM1, PRED, P, PP, NN, Q, R, FF, XX, AA, ID, USE, ATOM, SYM2};list parse( parser p, list input );parser result( object a );parser zero( void );parser item( void );parser bind( parser p, oper f );parser plus( parser p, parser q );#define PLUS(...) reduce( plus, PP_NARG(__VA_ARGS__), (object[]){ __VA_ARGS__ } )parser sat( predicate pred );parser alpha( void );parser digit( void );parser lit( object a );parser chr( int c );parser str( char *s );parser anyof( char *s );parser noneof( char *s );parser seq( parser p, parser q );#define SEQ(...) reduce( seq, PP_NARG(__VA_ARGS__), (object[]){ __VA_ARGS__ } )parser xthen( parser p, parser q );parser thenx( parser p, parser q );parser into( parser p, object id, parser q );parser maybe( parser p );parser forward( void );parser many( parser p );parser some( parser p );parser trim( parser p );parser using( parser p, fOperator *f );parser regex( char *re );int par_main( void );pc9par.c
Implementation of the Parser Combinators. Includes 3 "internal DSL" examples with theregex() function, andpprintf() andpscanf() functions.
#include <ctype.h>#include <stdarg.h>#include <string.h>#include "pc9par.h"#include "pc9objpriv.h"listparse( parser p, list input ){ return valid( p ) && p->t == PARSER && valid( input ) ? p->Parser.f( p->Parser.v, input ) : NIL_;}static listpresult( object v, list input ){ return one( cons( assoc( Symbol(VALUE), v ), input ) );}parserresult( object a ){ return Parser( env( 0, 1, Symbol(VALUE), a ), presult );}static listpzero( object v, list input ){ return NIL_;}parserzero( void ){ return Parser( 0, pzero );}static listpitem( object v, list input ){ drop( 1, input ); return valid( input ) ? one( cons( x_( input ), xs_( input ) ) ) : NIL_; //return valid( input ) ? one( cons( x_( take( 1, input ) ), xs_( input ) ) ) : NIL_; //strict //return valid( input ) ? one( cons( x_( input ), xs_( input ) ) ) : NIL_; //lazy}parseritem( void ){ return Parser( 0, pitem );}static listpbind( object v, list input ){ parser p = assoc( Symbol(P), v ); oper f = assoc( Symbol(FF), v ); list r = parse( p, input ); return valid( r ) ? join( map( Operator( valid( f->Operator.v ) ? append( copy( f->Operator.v ), v ) : v, f->Operator.f ), r ) ) : NIL_;}parserbind( parser p, oper f ){ return Parser( env( 0, 2, Symbol(P), p, Symbol(FF), f ), pbind );}static listbplus( object v ){ list r = assoc( Symbol(R), v ); object qq = assoc( Symbol(Q), v ); *r = *at_( r ); return valid( r ) ? append( r, qq ) : qq;}static listcplus( object v ){ parser q = assoc( Symbol(Q), v ); list input = assoc( Symbol(X), v ); return parse( q, input );}static listpplus( object v, list input ){ parser p = assoc( Symbol(P), v ); parser q = assoc( Symbol(Q), v ); list r = parse( p, input ); object qq = Suspension( env( 0, 2, Symbol(Q), q, Symbol(X), input ), cplus ); return valid( r ) ? r->t == SUSPENSION ? Suspension( env( 0, 2, Symbol(R), r, Symbol(Q), qq ), bplus ) : append( r, qq ) : qq;}parserplus( parser p, parser q ){ if( !q ) return p; return Parser( env( 0, 2, Symbol(P), p, Symbol(Q), q ), pplus );}static listpsat( object v, list input ){ predicate pred = assoc( Symbol(PRED), v ); object r = apply( pred, x_( input ) ); return valid( r ) ? one( cons( x_( input ), xs_( input ) ) ) : NIL_;}parsersat( predicate pred ){ return bind( item(), Operator( env( 0, 1, Symbol(PRED), pred ), psat ) );}static booleanpalpha( object v, object o ){ return isalpha( o->Int.i ) ? T_ : NIL_;}parseralpha( void ){ return sat( Operator( 0, palpha ) );}static booleanpdigit( object v, object o ){ return isdigit( o->Int.i ) ? T_ : NIL_;}parserdigit( void ){ return sat( Operator( 0, pdigit ) );}static booleanplit( object v, object o ){ object a = assoc( Symbol(X), v ); return eq( a, o );}parserlit( object a ){ return sat( Operator( env( 0, 1, Symbol(X), a ), plit ) );}parserchr( int c ){ return lit( Int( c ) );}parserstr( char *s ){ return *s ? seq( chr( *s ), str( s+1 ) ) : result(0);}parseranyof( char *s ){ return *s ? plus( chr( *s ), anyof( s+1 ) ) : zero();}static listpnone( object v, list input ){ parser p = assoc( Symbol(NN), v ); object r = parse( p, input ); *r = *at_( r ); return valid( r ) ? NIL_ : pitem( 0, input );}parsernoneof( char *s ){ return Parser( env( 0, 1, Symbol(NN), anyof( s ) ), pnone );}static listpprepend( object v, list o ){ object a = assoc( Symbol(AA), v ); return valid( a ) ? cons( cons( a, x_( o ) ), xs_( o ) ) : o;}static listprepend( list a, list b ){ return map( Operator( env( 0, 1, Symbol(AA), a ), pprepend ), b );}static listpseq( object v, list output ){ parser q = assoc( Symbol(Q), v ); return prepend( x_( output ), parse( q, xs_( output ) ) );}parserseq( parser p, parser q ){ if( !q ) return p; return bind( p, Operator( env( 0, 1, Symbol(Q), q ), pseq ) );}static listpxthen( object v, list o ){ return one( cons( xs_( x_( o ) ), xs_( o ) ) );}parserxthen( parser p, parser q ){ return bind( seq( p, q ), Operator( 0, pxthen ) );}static listpthenx( object v, list o ){ return one( cons( x_( x_( o ) ), xs_( o ) ) );}parserthenx( parser p, parser q ){ return bind( seq( p, q ), Operator( 0, pthenx ) );}static listpinto( object v, list o ){ object id = assoc( Symbol(ID), v ); parser q = assoc( Symbol(Q), v ); return parse( Parser( env( q->Parser.v, 1, id, x_( o ) ), q->Parser.f ), xs_( o ) );}parserinto( parser p, object id, parser q ){ return bind( p, Operator( env( 0, 2, Symbol(ID), id, Symbol(Q), q ), pinto ) );}parsermaybe( parser p ){ return plus( p, result(0) );}parserforward( void ){ return Parser( 0, 0 );}parsermany( parser p ){ parser q = forward(); parser r = maybe( seq( p, q ) ); *q = *r; return r;}parsersome( parser p ){ return seq( p, many( p ) );}static listptrim( object v, list input ){ parser p = assoc( Symbol(PP), v ); list r = parse( p, input ); return valid( r ) ? one( x_( take( 1, r ) ) ) : r;}parsertrim( parser p ){ return Parser( env( 0, 1, Symbol(PP), p ), ptrim );}static listpusing( object v, list o ){ oper f = assoc( Symbol(USE), v ); return one( cons( apply( f, x_( o ) ), xs_( o ) ) );}parserusing( parser p, fOperator *f ){ return bind( p, Operator( env( 0, 1, Symbol(USE), Operator( 0, f ) ), pusing ) );}static parserdo_meta( parser a, object o ){ switch( o->Int.i ){ case '*': return many( a ); break; case '+': return some( a ); break; case '?': return maybe( a ); break; } return a;}static parseron_meta( object v, object o ){ parser atom = assoc( Symbol(ATOM), v ); return valid( o ) ? do_meta( atom, o ) : atom;}static parser on_dot( object v, object o ){ return item(); }static parser on_chr( object v, object o ){ return lit( o ); }static parser on_term( object v, object o ){ return collapse( seq, o ); }static parser on_expr( object v, object o ){ return collapse( plus, o ); }#define META "*+?"#define SPECIAL META ".|()"parserregex( char *re ){ static parser p; if( !p ){ parser dot = using( chr('.'), on_dot ); parser meta = anyof( META ); parser escape = xthen( chr('\\'), anyof( SPECIAL "\\" ) ); parser chr_ = using( plus( escape, noneof( SPECIAL ) ), on_chr ); parser expr_ = forward(); parser atom = PLUS( dot, xthen( chr('('), thenx( expr_, chr(')') ) ), chr_ ); parser factor = into( atom, Symbol(ATOM), using( maybe( meta ), on_meta ) ); parser term = using( some( factor ), on_term ); parser expr = using( seq( term, many( xthen( chr('|'), term ) ) ), on_expr ); *expr_ = *expr; p = trim( expr ); add_global_root( p ); } list r = parse( p, chars_from_string( re ) ); return valid( r ) ? ( x_( x_( r ) ) ) : r;}parservusing( parser p, object v, fOperator *f ){ return bind( p, Operator( env( 0, 1, Symbol(USE), Operator( v, f ) ), pusing ) );}object sum( object a, object b ){ return Int( a->Int.i + b->Int.i ); }boolean nz( object v, object o ){ return o->Int.i ? T_ : NIL_; }static object p_char( object v, list o ){ va_list *p = (void *)v; return putchar(va_arg( *p, int )), Int(1);}static object p_string( object v, list o ){ va_list *p = (void *)v; char *s = va_arg( *p, char* ); return fputs( s, stdout ), Int(strlen( s ));}static object p_lit( object v, list o ){ return putchar( o->Int.i ), Int(1);}static object on_fmt( object v, list o ){ return collapse( sum, o ); }intpprintf( char const *fmt, ... ){ if( !fmt ) return 0; static va_list v; va_start( v, fmt ); static parser p; if( !p ){ parser directive = PLUS( using( chr('%'), p_lit ), vusing( chr('c'), (void *)&v, p_char ), vusing( chr('s'), (void *)&v, p_string ) ); parser term = PLUS( xthen( chr('%'), directive ), using( sat( Operator( 0, nz ) ), p_lit ) ); parser format = many( term ); p = using( format, on_fmt ); add_global_root( p ); } object r = parse( p, chars_from_string( (char*)fmt ) ); drop( 1, r ); va_end( v ); return x_( x_( r ) )->Int.i;}static object convert_char( object v, list o ){ va_list *p = (void *)v; char *cp = va_arg( *p, char* ); *cp = o->Int.i; return Int(1);}static object convert_string( object v, list o ){ va_list *p = (void *)v; char *sp = va_arg( *p, char* ); fill_string( &sp, o ); return Int(1);}static parser on_char( object v, list o ){ return vusing( item(), v, convert_char );}static parser on_string( object v, list o ){ return vusing( xthen( many( anyof( " \t\n" ) ), many( noneof( " \t\n" ) ) ), v, convert_string );}static object r_zero( object v, list o ){ return Int(0); }static parser pass( parser p ){ return using( p, r_zero ); }static parser on_space( object v, list o ){ return valid( o ) ? pass( many( anyof( " \t\n" ) ) ) : o; }static parser on_percent( object v, list o ){ return pass( chr('%') ); }static parser on_lit( object v, list o ){ return pass( lit( o ) ); }static object sum_up( object v, list o ){ return collapse( sum, o ); }static parser on_terms( object v, list o ){ return using( collapse( seq, o ), sum_up ); }intpscanf( char const *fmt, ... ){ if( !fmt ) return 0; static va_list v; va_start( v, fmt ); static parser p; if( !p ){ parser space = using( many( anyof( " \t\n" ) ), on_space ); parser directive = PLUS( using( chr('%'), on_percent ), vusing( chr('c'), (void *)&v, on_char ), vusing( chr('s'), (void *)&v, on_string ) ); parser term = PLUS( xthen( chr('%'), directive ), using( sat( Operator( 0, nz ) ), on_lit ) ); parser format = many( seq( space, term ) ); p = using( format, on_terms ); add_global_root( p ); } list fp = parse( p, chars_from_string( (char*)fmt ) ); drop( 1, fp ); parser f = x_( x_( fp ) ); if( !valid( f ) ) return 0; list r = parse( f, chars_from_file( stdin ) ); drop( 1, r ); va_end( v ); return valid( r ) ? x_( x_( r ) )->Int.i : 0;}int test_pscanf(){ char c; PRINT( Int( pscanf( "" ) ) ); PRINT( Int( pscanf( "abc" ) ) ); PRINT( Int( pscanf( " %c", &c ) ) ); PRINT( string_from_chars( Int( c ) ) ); char buf[100]; PRINT( Int( pscanf( "%s", buf ) ) ); PRINT( String( buf, 0 ) ); return 0;}int test_pprintf(){ PRINT( Int( pprintf( "%% abc %c %s\n", 'x', "123" ) ) ); return 0;}int test_regex(){ parser a; PRINT( a = regex( "\\." ) ); PRINT( parse( a, chars_from_string( "a" ) ) ); PRINT( parse( a, chars_from_string( "." ) ) ); PRINT( parse( a, chars_from_string( "\\." ) ) ); parser b; PRINT( b = regex( "\\\\\\." ) ); PRINT( parse( b, chars_from_string( "\\." ) ) ); PRINT( take( 3, parse( b, chars_from_string( "\\." ) ) ) ); parser r; PRINT( r = regex( "a?b+(c).|def" ) ); PRINT( parse( r, chars_from_string( "abc" ) ) ); PRINT( parse( r, chars_from_string( "abbcc" ) ) ); PRINT( Int( garbage_collect( r ) ) ); list s; PRINT( s = parse( r, chars_from_string( "def" ) ) ); PRINT( take( 3, s ) ); PRINT( parse( r, chars_from_string( "deff" ) ) ); PRINT( parse( r, chars_from_string( "adef" ) ) ); PRINT( parse( r, chars_from_string( "bcdef" ) ) ); PRINT( Int( garbage_collect( cons( r, s ) ) ) ); parser t; PRINT( t = regex( "ac|bd" ) ); PRINT( parse( t, chars_from_string( "ac" ) ) ); PRINT( take( 1, parse( t, chars_from_string( "bd" ) ) ) ); PRINT( Int( garbage_collect( t ) ) ); parser u; PRINT( u = regex( "ab|cd|ef" ) ); PRINT( parse( u, chars_from_string( "ab" ) ) ); PRINT( parse( u, chars_from_string( "cd" ) ) ); PRINT( take( 1, parse( u, chars_from_string( "cd" ) ) ) ); PRINT( parse( u, chars_from_string( "ef" ) ) ); PRINT( take( 1, parse( u, chars_from_string( "ef" ) ) ) ); PRINT( Int( garbage_collect( u ) ) ); parser v; PRINT( v = regex( "ab+(c).|def" ) ); PRINT( parse( v, chars_from_string( "def" ) ) ); PRINT( take( 2, parse( v, chars_from_string( "def" ) ) ) ); parser w; PRINT( w = regex( "a?b|c" ) ); PRINT( parse( w, chars_from_string( "a" ) ) ); PRINT( parse( w, chars_from_string( "b" ) ) ); PRINT( take( 3, parse( w, chars_from_string( "c" ) ) ) ); PRINT( Int( garbage_collect( w ) ) ); return 0;}int test_env(){ object e = env( 0, 2, Symbol(F), Int(2), Symbol(X), Int(4) ); PRINT( e ); PRINT( assoc( Symbol(F), e ) ); PRINT( assoc( Symbol(X), e ) ); return 0;}object b( object v, object o ){ return one( cons( Int( - x_( o )->Int.i ), xs_( o ) ) );}int test_parsers(){ list ch = chars_from_string( "a b c 1 2 3 d e f 4 5 6" ); { parser p = result( Int(42) ); PRINT( parse( p, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); } { parser q = zero(); PRINT( parse( q, ch ) ); PRINT( Int( garbage_collect( ch ) ) ); } { parser r = item(); PRINT( r ); PRINT( parse( r, ch ) ); PRINT( x_( parse( r, ch ) ) ); PRINT( take( 1, x_( parse( r, ch ) ) ) ); PRINT( x_( take( 1, x_( parse( r, ch ) ) ) ) ); PRINT( take( 1, x_( take( 1, x_( parse( r, ch ) ) ) ) ) ); PRINT( parse( bind( r, Operator( 0, b ) ), ch ) ); PRINT( Int( garbage_collect( cons( ch, r ) ) ) ); } { parser s = plus( item(), alpha() ); PRINT( s ); PRINT( parse( s, ch ) ); PRINT( take( 2, parse( s, ch ) ) ); PRINT( Int( garbage_collect( ch ) ) ); } { parser t = lit( Int( 'a' ) ); PRINT( parse( t, ch ) ); parser u = str( "a b c" ); PRINT( parse( u, ch ) ); PRINT( Int( garbage_collect( cons( ch, cons( t, u ) ) ) ) ); } return 0;}int par_main(){ return obj_main(), test_env(), test_parsers(), test_regex(), test_pprintf(), test_pscanf(), 0;}pc9tok.h
Interface to the example tokenizer forcirca 1975 pre-K&R C. Perhaps too much macro stuff? All the keywords and operators and punctuation which can be matched as exact strings are defined in an X-macro table which associates each string with an identifier. The identifiers are all defined in the enum for use with the symbol typed objects.
#define PC9TOK_H#ifndef PC9PAR_H #include "pc9par.h"#endif#define Each_Symbolic(_) \ _("int", k_int) _("char", k_char) _("float", k_float) _("double", k_double) _("struct", k_struct) \ _("auto", k_auto) _("extern", k_extern) _("register", k_register) _("static", k_static) \ _("goto", k_goto) _("return", k_return) _("sizeof", k_sizeof) \ _("break", k_break) _("continue", k_continue) \ _("if", k_if) _("else", k_else) \ _("for", k_for) _("do", k_do) _("while", k_while) \ _("switch", k_switch) _("case", k_case) _("default", k_default) \ /*_("entry", k_entry)*/ \ _("*", o_star) _("++", o_plusplus) _("+", o_plus) _(".", o_dot) \ _("->", o_arrow) _("--", o_minusminus) _("-", o_minus) _("!=", o_ne) _("!", o_bang) _("~", o_tilde) \ _("&&", o_ampamp) _("&", o_amp) _("==", o_equalequal) _("=", o_equal) \ _("^", o_caret) _("||", o_pipepipe) _("|", o_pipe) \ _("/", o_slant) _("%", o_percent) \ _("<<", o_ltlt) _("<=", o_le) _("<", o_lt) _(">>", o_gtgt) _(">=", o_ge) _(">", o_gt) \ _("=+", o_eplus) _("=-", o_eminus) _("=*", o_estar) _("=/", o_eslant) _("=%", o_epercent) \ _("=>>", o_egtgt) _("=<<", o_eltlt) _("=&", o_eamp) _("=^", o_ecaret) _("=|", o_epipe) \ _("(", lparen) _(")", rparen) _(",", comma) _(";", semi) _(":", colon) _("?", quest) \ _("{", lbrace) _("}", rbrace) _("[", lbrack) _("]", rbrack) \//End Symbolic#define Enum_name(x,y) y ,enum token_symbols { t_id = SYM2, c_int, c_float, c_char, c_string, Each_Symbolic( Enum_name ) SYM3};list tokens_from_chars( object v );int tok_main( void );pc9tok.c
Implementation of the tokenizer for pre-K&R C. All of the identifiers from the table in the header file are converted into parsers which match the associated string and yield the symbol as output. The next layer can easily match against these symbols. The symbol type object also has an extradata pointer to hold extra stuff. The token functions pack the actual input string and any preliminary whitespace in this pointer in the symbol object. So this data isn't lost, but it's hidden from the parser layer which just deals with token symbols.
#include "pc9tok.h"#include "pc9objpriv.h"static object on_spaces( object v, list o ){ return string_from_chars( o ); }static object on_integer( object v, list o ){ return cons( Symbol(c_int), string_from_chars( o ) ); }static object on_floating( object v, list o ){ return cons( Symbol(c_float), string_from_chars( o ) ); }static object on_character( object v, list o ){ return cons( Symbol(c_char), string_from_chars( o ) ); }static object on_string( object v, list o ){ return cons( Symbol(c_string), string_from_chars( o ) ); }static object on_identifier( object s, list o ){ return cons( Symbol(t_id), string_from_chars( o ) ); }#define On_Symbolic(a,b) \ static object on_##b( object v, list o ){ return cons( Symbol(b), string_from_chars( o ) ); }Each_Symbolic( On_Symbolic )static parsertoken_parser( void ){ parser space = using( many( anyof( " \t\n" ) ), on_spaces ); parser alpha_ = plus( alpha(), chr('_') ); parser integer = using( some( digit() ), on_integer ); parser floating = using( SEQ( plus( SEQ( some( digit() ), chr('.'), many( digit() ) ), seq( chr('.'), some( digit() ) ) ), maybe( SEQ( anyof("eE"), maybe( anyof("+-") ), some( digit() ) ) ) ), on_floating ); parser escape = seq( chr('\\'), plus( seq( digit(), maybe( seq( digit(), maybe( digit() ) ) ) ), anyof( "'\"bnrt\\" ) ) ); parser char_ = plus( escape, noneof( "'\n" ) ); parser schar_ = plus( escape, noneof( "\"\n" ) ); parser character = using( SEQ( chr('\''), char_, chr('\'') ), on_character ); parser string = using( SEQ( chr('"'), many( schar_ ), chr('"') ), on_string ); parser constant = PLUS( floating, integer, character, string );# define Handle_Symbolic(a,b) using( str( a ), on_##b ), parser symbolic = PLUS( Each_Symbolic( Handle_Symbolic ) zero() ); parser identifier = using( seq( alpha_, many( plus( alpha_, digit() ) ) ), on_identifier ); return seq( space, PLUS( constant, symbolic, identifier ) );}static object on_token( object v, list o ){ object space = x_( o ); object symbol = x_( xs_( o ) ); object string = xs_( xs_( o ) ); return symbol->Symbol.data = cons( space, string ), symbol; return cons( symbol, cons( space, string ) );}listptokens_from_chars( object s ){ if( !valid( s ) ) return Symbol(EOF); static parser p; if( !p ){ p = using( token_parser(), on_token ); add_global_root( p ); } list r = parse( p, s ); take( 1, r ); r = x_( r ); return cons( x_( r ), Suspension( xs_( r ), ptokens_from_chars ) );}listtokens_from_chars( object s ){ return valid( s ) ? Suspension( s, ptokens_from_chars ) : Symbol(EOF);}int test_tokens(){ list tokens = tokens_from_chars( chars_from_string( "'x' auto \"abc\" 12 ;*++'\\42' '\\n' 123 if" ) ); PRINT( tokens ); PRINT( take( 1, tokens ) ); PRINT( take( 2, tokens ) ); PRINT( drop( 1, tokens ) ); PRINT( take( 2, drop( 1, tokens ) ) ); drop( 7, tokens ); PRINT( tokens ); PRINT( Int( garbage_collect( tokens ) ) ); return 0;}int tok_main(){ return par_main(), test_tokens(), 0;}pc9syn.h
Interface to the Syntax Analyzer for pre-K&R C. Pretty simple this time, just extending the symbol ids and declaring the main parser function.
#define PC9SYN_H#ifndef PC9TOK_H #include "pc9tok.h"#endifenum syntax_analysis_symbols { func_def = SYM3, data_def, SYM4};list tree_from_tokens( object s );pc9syn.c
Implementation of the Syntax Analyzer for pre-K&R C. All of the symbols from the tokenizer are converted into parsers which match those symbols and are named with an extra underscore appended. Soc_float is an enum,Symbol(c_float) is a symbol object, andc_float_ (with extra underscore) is a parser which matches that token symbol. So all the names in here with underscores, likecomma_semi_k_if_, are parsers which match against the tokens coming from the input list.
#include "pc9syn.h"#include "pc9objpriv.h"#define Extra_Symbols(_) \ _(t_id) _(c_int) _(c_float) _(c_char) _(c_string)#define Parser_for_symbolic_(a,b) parser b##_ = lit( Symbol(b) );#define Parser_for_symbol_(b) parser b##_ = lit( Symbol(b) );static object on_func_def( object v, list o ){ object s = Symbol(func_def); return s->Symbol.data = o, s; return cons( Symbol(func_def), o ); }static object on_data_def( object v, list o ){ object s = Symbol(data_def); return s->Symbol.data = o, s;}parserparser_for_grammar( void ){ Each_Symbolic( Parser_for_symbolic_ ) Extra_Symbols( Parser_for_symbol_ ) parser identifier = t_id_; parser asgnop = PLUS( o_equal_, o_eplus_, o_eminus_, o_estar_, o_eslant_, o_epercent_, o_egtgt_, o_eltlt_, o_eamp_, o_ecaret_, o_epipe_ ); parser constant = PLUS( c_int_, c_float_, c_char_, c_string_ ); parser lvalue = forward(); parser expression = forward(); *lvalue = *PLUS( identifier, seq( o_star_, expression ), //SEQ( primary, o_arrow_, identifier ), // introduces a left-recursion indirectly SEQ( lparen_, lvalue, rparen_ ) ); parser expression_list = seq( expression, many( seq( comma_, expression ) ) ); parser primary = seq( PLUS( identifier, constant, SEQ( lparen_, expression, rparen_ ), SEQ( lvalue, o_dot_, identifier ) ), maybe( PLUS( SEQ( lparen_, expression_list, rparen_ ), SEQ( lbrack_, expression, rbrack_ ), seq( o_arrow_, identifier ) ) ) ); *expression = *seq( PLUS( primary, seq( o_star_, expression ), seq( o_amp_, expression ), seq( o_minus_, expression ), seq( o_bang_, expression ), seq( o_tilde_, expression ), seq( o_plusplus_, lvalue ), seq( o_minusminus_, lvalue ), seq( lvalue, o_plusplus_ ), seq( lvalue, o_minusminus_ ), seq( k_sizeof_, expression ), SEQ( lvalue, asgnop, expression ) ), maybe( PLUS( seq( PLUS( o_star_, o_slant_, o_percent_ ), expression ), seq( PLUS( o_plus_, o_minus_ ), expression ), seq( PLUS( o_ltlt_, o_gtgt_ ), expression ), seq( PLUS( o_lt_, o_le_, o_gt_, o_ge_ ), expression ), seq( PLUS( o_equalequal_, o_ne_ ), expression ), seq( o_amp_, expression ), seq( o_caret_, expression ), seq( o_pipe_, expression ), seq( o_ampamp_, expression ), seq( o_pipepipe_, expression ), SEQ( quest_, expression, colon_, expression ), seq( comma_, expression ) ) ) ); parser constant_expression = expression; parser statement = forward(); parser statement_list = many( statement ); *statement = *PLUS( seq( expression, semi_ ), SEQ( lbrace_, statement_list, rbrace_ ), SEQ( k_if_, lparen_, expression, rparen_, statement ), SEQ( k_if_, lparen_, expression, rparen_, statement, k_else_, statement ), SEQ( k_do_, statement, k_while_, lparen_, expression, rparen_, semi_ ), SEQ( k_while_, lparen_, expression, rparen_, statement ), SEQ( k_for_, lparen_, maybe( expression ), semi_, maybe( expression ), semi_, maybe( expression ), rparen_, statement ), SEQ( k_switch_, lparen_, expression, rparen_, statement ), SEQ( k_case_, constant_expression, colon_, statement ), SEQ( k_default_, colon_, statement ), seq( k_break_, semi_ ), seq( k_continue_, semi_ ), seq( k_return_, semi_ ), SEQ( k_return_, expression, semi_ ), SEQ( k_goto_, expression, semi_ ), SEQ( identifier, colon_, statement ), semi_ ); parser constant_expression_list = seq( constant_expression, many( seq( comma_, constant_expression ) ) ); parser initializer = plus( constant, constant_expression_list ); parser type_specifier = forward(); parser declarator_list = forward(); parser type_declaration = SEQ( type_specifier, declarator_list, semi_ ); parser type_decl_list = some( type_declaration ); parser sc_specifier = PLUS( k_auto_, k_static_, k_extern_, k_register_ ); *type_specifier = *PLUS( k_int_, k_char_, k_float_, k_double_, SEQ( k_struct_, lbrace_, type_decl_list, rbrace_ ), SEQ( k_struct_, identifier, lbrace_, type_decl_list, rbrace_ ), SEQ( k_struct_, identifier ) ); parser declarator = forward(); *declarator = *seq( PLUS( identifier, seq( o_star_, declarator ), SEQ( lparen_, declarator, rparen_ ) ), maybe( PLUS( seq( lparen_, rparen_ ), SEQ( lbrack_, constant_expression, rbrack_ ) ) ) ); *declarator_list = *seq( declarator, many( seq( comma_, declarator ) ) ); parser decl_specifiers = PLUS( type_specifier, sc_specifier, seq( type_specifier, sc_specifier ), seq( sc_specifier, type_specifier ) ); parser declaration = seq( decl_specifiers, maybe( declarator_list ) ); parser declaration_list = seq( declaration, many( seq( comma_, declaration ) ) ); parser init_declarator = seq( declarator, maybe( initializer ) ); parser init_declarator_list = seq( init_declarator, many( seq( comma_, init_declarator ) ) ); parser data_def = using( SEQ( maybe( k_extern_ ), maybe( type_specifier ), maybe( init_declarator_list ), semi_ ), on_data_def ); parser parameter_list = maybe( seq( expression, many( seq( comma_, expression ) ) ) ); parser function_declarator = SEQ( declarator, lparen_, parameter_list, rparen_ ); parser function_statement = SEQ( lbrace_, maybe( declaration_list ), many( statement ), rbrace_ ); parser function_body = seq( maybe( type_decl_list ), function_statement ); parser function_def = using( SEQ( maybe( type_specifier ), function_declarator, function_body ), on_func_def ); parser external_def = plus( function_def, data_def ); parser program = some( external_def ); return program;}listtree_from_tokens( object s ){ if( !s ) return NIL_; static parser p; if( !p ){ p = parser_for_grammar(); add_global_root( p ); } return parse( p, s );}int test_syntax(){ char *source ="\n""int i,j,k 5;\n""float d 3.4;\n""int max(a, b, c)\n""int a, b, c;\n""{\n"" int m;\n"" m = (a>b)? a:b;\n"" return(m>c? m:c);\n""}\n""main( ) {\n""\tprintf(\"Hello, world\");\n""}\n""\t if( 2 ){\n\t x = 5;\n\t } int auto"; object tokens = tokens_from_chars( chars_from_string( source ) ); add_global_root( tokens ); PRINT( take( 4, tokens ) ); object program = tree_from_tokens( tokens ); PRINT( program ); PRINT( x_( x_( ( drop( 1, program ), program ) ) ) ); PRINT_FLAT( x_( x_( program ) ) ); PRINT_DATA( x_( x_( program ) ) ); PRINT( xs_( x_( program ) ) ); PRINT( Int( garbage_collect( program ) ) ); return 0;}int main(){ return tok_main(), test_syntax(), 0;}The input to the test is for thepscanf() calls which looks for the literal"abc" then a%c then a%s. Andtest is the first rule, so a simplemake command will compile and then run the test rule/script.
CFLAGS= -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variableCFLAGS+= $(cflags)test : pc9 echo abc j string | ./$<clean : rm *.opc9 : pc9obj.o pc9fp.o pc9par.o pc9tok.o pc9syn.o $(CC) $(CFLAGS) -o $@ $^ $(LDLIBS)All told, it's just under 1500 lines.
$ wc -l *[ch] 128 pc9fp.c 19 pc9fp.h 316 pc9obj.c 54 pc9obj.h 24 pc9objpriv.h 529 pc9par.c 48 pc9par.h 208 pc9syn.c 12 pc9syn.h 85 pc9tok.c 38 pc9tok.h 28 ppnarg.h 1489 totalOutput from simple tests. At the very end, theprint_data function is used to recover all the strings hidden inside the token symbols in the syntax tree, reconstructing the source code.
$ make -kcc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable -c -o pc9obj.o pc9obj.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable -c -o pc9fp.o pc9fp.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable -c -o pc9par.o pc9par.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable -c -o pc9tok.o pc9tok.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable -c -o pc9syn.o pc9syn.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable -o pc9 pc9obj.o pc9fp.o pc9par.o pc9tok.o pc9syn.o echo abc j string | ./pc9test_basics: ch = ... test_basics: Int( garbage_collect( ch ) ) = 0 test_basics: take( 1, ch ) = (97 _ )test_basics: Int( garbage_collect( ch ) ) = 4 test_basics: x_( ch ) = 97 test_basics: Int( garbage_collect( ch ) ) = 1 test_basics: x_( xs_( ch ) ) = ... test_basics: Int( garbage_collect( ch ) ) = 2 test_basics: take( 1, x_( xs_( ch ) ) ) = (_ _ )test_basics: Int( garbage_collect( ch ) ) = 5 test_basics: take( 5, ch ) = (97 98 99 100 101 _ )test_basics: Int( garbage_collect( ch ) ) = 12 test_basics: ch = (97 98 99 100 101 ... )test_basics: Int( garbage_collect( ch ) ) = 1 test_basics: take( 6, ch ) = (97 98 99 100 101 102 _ )test_basics: Int( garbage_collect( ch ) ) = 9 test_basics: take( 1, ch ) = (97 _ )test_basics: Int( garbage_collect( ch ) ) = 2 test_basics: take( 2, ch ) = (97 98 _ )test_basics: Int( garbage_collect( ch ) ) = 3 test_basics: take( 2, ch ) = (97 98 _ )test_basics: Int( garbage_collect( ch ) ) = 3 test_basics: take( 2, ch ) = (97 98 _ )test_basics: Int( garbage_collect( ch ) ) = 3 test_basics: take( 2, ch ) = (97 98 _ )test_basics: Int( garbage_collect( ch ) ) = 3 test_env: e = ((X 4 )(F 2 )() )test_env: assoc( Symbol_( F, "F" ), e ) = 2 test_env: assoc( Symbol_( X, "X" ), e ) = 4 test_parsers: parse( p, ch ) = ((42 ... )_ )test_parsers: Int( garbage_collect( ch ) ) = 33 test_parsers: parse( q, ch ) = _ test_parsers: Int( garbage_collect( ch ) ) = 2 test_parsers: r = Parser test_parsers: parse( r, ch ) = ((97 ... )_ )test_parsers: x_( parse( r, ch ) ) = (97 ... )test_parsers: take( 1, x_( parse( r, ch ) ) ) = (97 _ )test_parsers: x_( take( 1, x_( parse( r, ch ) ) ) ) = 97 test_parsers: take( 1, x_( take( 1, x_( parse( r, ch ) ) ) ) ) = (_ _ )test_parsers: parse( bind( r, Operator( 0, b ) ), ch ) = ((-97 ... )... )test_parsers: Int( garbage_collect( cons( ch, r ) ) ) = 46 test_parsers: s = Parser test_parsers: parse( s, ch ) = ((97 ... )... )test_parsers: take( 2, parse( s, ch ) ) = ((97 ... )(97 ... )_ )test_parsers: Int( garbage_collect( ch ) ) = 76 test_parsers: parse( t, ch ) = ((97 ... )... )test_parsers: parse( u, ch ) = (((97 32 98 32 99 () )... )... )test_parsers: Int( garbage_collect( cons( ch, cons( t, u ) ) ) ) = 372 test_regex: a = regex( "\\." ) = _ test_regex: parse( a, chars_from_string( "a" ) ) = _ test_regex: parse( a, chars_from_string( "." ) ) = _ test_regex: parse( a, chars_from_string( "\\." ) ) = _ test_regex: b = regex( "\\\\\\." ) = _ test_regex: parse( b, chars_from_string( "\\." ) ) = _ test_regex: take( 3, parse( b, chars_from_string( "\\." ) ) ) = _ test_regex: r = regex( "a?b+(c).|def" ) = Parser test_regex: parse( r, chars_from_string( "abc" ) ) = ... test_regex: parse( r, chars_from_string( "abbcc" ) ) = ... test_regex: Int( garbage_collect( r ) ) = 13660 test_regex: s = parse( r, chars_from_string( "def" ) ) = ... test_regex: take( 3, s ) = _ test_regex: parse( r, chars_from_string( "deff" ) ) = ... test_regex: parse( r, chars_from_string( "adef" ) ) = ... test_regex: parse( r, chars_from_string( "bcdef" ) ) = ... test_regex: Int( garbage_collect( cons( r, s ) ) ) = 130 test_regex: t = regex( "ac|bd" ) = _ test_regex: parse( t, chars_from_string( "ac" ) ) = _ test_regex: take( 1, parse( t, chars_from_string( "bd" ) ) ) = _ test_regex: Int( garbage_collect( t ) ) = 5294 test_regex: u = regex( "ab|cd|ef" ) = _ test_regex: parse( u, chars_from_string( "ab" ) ) = _ test_regex: parse( u, chars_from_string( "cd" ) ) = _ test_regex: take( 1, parse( u, chars_from_string( "cd" ) ) ) = _ test_regex: parse( u, chars_from_string( "ef" ) ) = _ test_regex: take( 1, parse( u, chars_from_string( "ef" ) ) ) = _ test_regex: Int( garbage_collect( u ) ) = 7804 test_regex: v = regex( "ab+(c).|def" ) = Parser test_regex: parse( v, chars_from_string( "def" ) ) = _ test_regex: take( 2, parse( v, chars_from_string( "def" ) ) ) = _ test_regex: w = regex( "a?b|c" ) = Parser test_regex: parse( w, chars_from_string( "a" ) ) = ... test_regex: parse( w, chars_from_string( "b" ) ) = ... test_regex: take( 3, parse( w, chars_from_string( "c" ) ) ) = ((() ... )_ )test_regex: Int( garbage_collect( w ) ) = 13306 test_pprintf: Int( pprintf( "%% abc %c %s\n", 'x', "123" ) ) = % abc x 12312 test_pscanf: Int( pscanf( "" ) ) = 0 test_pscanf: Int( pscanf( "abc" ) ) = 0 test_pscanf: Int( pscanf( " %c", &c ) ) = 1 test_pscanf: string_from_chars( Int( c ) ) = "j"test_pscanf: Int( pscanf( "%s", buf ) ) = 1 test_pscanf: String( buf, 0 ) = "string"test_tokens: tokens = ... test_tokens: take( 1, tokens ) = (c_char _ )test_tokens: take( 2, tokens ) = (c_char k_auto _ )test_tokens: drop( 1, tokens ) = (k_auto ... )test_tokens: take( 2, drop( 1, tokens ) ) = (k_auto c_string _ )test_tokens: tokens = (c_char k_auto c_string c_int semi o_star o_plusplus ... )test_tokens: Int( garbage_collect( tokens ) ) = 28834 test_syntax: take( 4, tokens ) = (k_int t_id comma t_id _ )test_syntax: program = ... test_syntax: x_( x_( ( drop( 1, program ), program ) ) ) = (data_def data_def func_def func_def () )test_syntax: x_( x_( program ) ) flat= data_def data_def func_def func_def test_syntax: x_( x_( program ) ) data= int i,j,k 5;float d 3.4;int max(a, b, c)int a, b, c;{ int m; m = (a>b)? a:b; return(m>c? m:c);}main( ) { printf("Hello, world");}test_syntax: xs_( x_( program ) ) = (k_if ... )test_syntax: Int( garbage_collect( program ) ) = 434910- \$\begingroup\$An earlier, less lazy version of this code was posted incomp.lang.c and received helpful comments which have been applied in the code here.\$\endgroup\$luser droog– luser droog2019-05-14 05:33:10 +00:00CommentedMay 14, 2019 at 5:33
- \$\begingroup\$Have you considered
CFLAGS += -Wextra?\$\endgroup\$Toby Speight– Toby Speight2019-06-19 07:36:01 +00:00CommentedJun 19, 2019 at 7:36 - \$\begingroup\$@TobySpeight I hadn't before, but trying it now it just tells me about unused parameters. They're all in
fOperfunctions which don't use the environment parameter. It's a good point that I should have checked this, but I think it doesn't tell me anything interesting.\$\endgroup\$luser droog– luser droog2019-06-19 18:04:07 +00:00CommentedJun 19, 2019 at 18:04
1 Answer1
Bug: singleton objects have no allocation record
Since the garbage collector will try to set themark() in aSYMBOL object, theT_ object needs a dummy allocation record.NIL_ doesn't need one since anINVALID object will not get marked.
pc9obj.c:
object T_ = &(1[(union uobject[]){{ .t = 0 },{ .Symbol = { SYMBOL, T, "T" } }}]), NIL_ = (union uobject[]){{ .t = INVALID }};Bug: using object fields for non-object data
In thepprintf() andpscanf() functions, theobject field inOPERATOR objects sometimes contains ava_list *! The garbage collector might fiddle with the memory around this address if it tries to set the (non-existant)mark(). The copious(void *) casts are a code smell. Better to use theVOID type object to hold this pointer.
Missing functions
There'ssome for 1 or more,many for 0 or more,maybe for 0 or 1.But there's no function to matchn times, orn or more, orn up to m times — these kind of quantifiers.
Poor namespacing for internal symbols
enum parser_symbols { VALUE = SYM1, PRED, P, PP, NN, Q, R, FF, XX, AA, ID, USE, ATOM, SYM2};What areP, PP, NN, Q, R, FF, XX, AA?VALUEPRED andATOM are better but still kinda vague.
Short-circuit tests (and maybe actually test stuff)
int par_main(){ return obj_main(), test_env(), test_parsers(), test_regex(), test_pprintf(), test_pscanf(), 0;}Bonus formatting error. Better to short-circuit the tests based on the return values.
int par_main(){ return 0 || obj_main() || test_env() || test_parsers() || test_regex() || test_pprintf() || test_pscanf() || 0;}Then the testing functions can return non-zero to stop producing output.
No error reporting
A syntax error during parsing will result in an empty list being returned.Graham Hutton's paper describes how to rewrite the basic parser combinators so that meaningful error messages can be produced -- without using Monad Transformers which is the more typical way in functional languages.
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.