10
\$\begingroup\$

Parser Combinators are the amazingly elegant way to write parsers that has evolved over on the functional programming side of the world, but have been less available or accessible for imperative languages. I have attempted somewhat to bridge this gap bystealing copying as many ideas from FP as were needed to make these combinators in C.

I've written anoverview of Parser Combinators and presentations of thebasic lisp-like object system and thehigher order functions which supplement the basic objects. Of coursewikipedia has a nice article on the topic, and my code mostly follows the excellent paper byHutton and Meijer which is also a good introduction.

This code is the result of 16 (or so) re-writes of the code in myprevious question. I anticipate at least one more re-write to add more features like checking the offside rule both for parsing syntaxes lke Python which use this rule, or just checking that indentation in C code is reasonable.

A few remarks about the code overall: I'm trying the "McIllroy convention" which places header guards around the#include lines rather than inside the included files. Theobject type is a pointer to a union. Most of the.c files have a (name-mangled)main() function which does some simple tests; somewhere along the continuum betweenreal unit testing andnot testing.

I'm looking for review of the "weird stuff" if possible. Particularly the handling ofsuspensions, which represent values to be lazily evaluated. But of course anything weird or suspicious is fair game.

ppnarg.h
Written by Laurent Deniau for theC Object System. I have extended the maximum number of arguments a little. This macro supports the variadic combinatorsPLUS andSEQ which combine many subparsers together. It lets you pass a count of the variadic arguments into the function that processes them.

/* * The PP_NARG macro evaluates to the number of arguments that have been * passed to it. * * Laurent Deniau, "__VA_NARG__," 17 January 2006, <comp.std.c> (29 November 2007). */#define PP_NARG(...)    PP_NARG_(__VA_ARGS__,PP_RSEQ_N())#define PP_NARG_(...)   PP_ARG_N(__VA_ARGS__)#define PP_ARG_N( \        _1, _2, _3, _4, _5, _6, _7, _8, _9,_10,  \        _11,_12,_13,_14,_15,_16,_17,_18,_19,_20, \        _21,_22,_23,_24,_25,_26,_27,_28,_29,_30, \        _31,_32,_33,_34,_35,_36,_37,_38,_39,_40, \        _41,_42,_43,_44,_45,_46,_47,_48,_49,_50, \        _51,_52,_53,_54,_55,_56,_57,_58,_59,_60, \        _61,_62,_63,_64,_65,_66,_67,_68,_69,_70, \        _71,N,...) N#define PP_RSEQ_N() \        71,70, \        69,68,67,66,65,64,63,62,61,60, \        59,58,57,56,55,54,53,52,51,50, \        49,48,47,46,45,44,43,42,41,40, \        39,38,37,36,35,34,33,32,31,30, \        29,28,27,26,25,24,23,22,21,20, \        19,18,17,16,15,14,13,12,11,10, \        9,8,7,6,5,4,3,2,1,0

pc9obj.h
Public interface to the basic objects. All the constructors for various types of objects and lists. Especially,chars_from_string which produces a lazy list of characters. The symbol-typed objects are constructed with a unique numeric identifer and a print string. These unique ids are generated by declaring the names in an enum. The odd nameSYM1 is used to start the next enum in the nextlayer to add more symbols keeping all ids unique.

#define PC9OBJ_H#include <stdlib.h>#include <stdio.h>#define POINTER_TO  *typedef union uobject  POINTER_TO object;typedef object list;typedef object parser;typedef object oper;typedef oper   predicate;typedef object boolean;typedef object fSuspension( object );typedef list fParser( object, list );typedef object fOperator( object, object );typedef boolean fPredicate( object, object );typedef object fBinOper( object, object );enum object_symbols {  T, F, X, A, B,  SYM1};object T_, NIL_;int valid( object a );object Int( int i );list one( object a );list cons( object a, object b );object Suspension( object v, fSuspension *f );parser Parser( object v, fParser *f );oper Operator( object v, fOperator *f );object String( char *s, int disposable );object Symbol_( int sym, char *pname );#define Symbol(n) Symbol_( n, #n )object Void( void *v );void add_global_root( object a );int garbage_collect( object local_roots );object x_( list a );object xs_( list a );list take( int n, list o );list drop( int n, list o );list chars_from_string( char *v );list chars_from_file( FILE *v );object string_from_chars( list o );void print( object o );void print_list( list a );void print_flat( list a );void print_data( list a );#define PRINT_WRAPPER(_, __, ___) printf( "%s: %s %s= ", __func__, #__, ___ ), _( __ ), puts("")#define PRINT(__)      PRINT_WRAPPER( print_list, __, "" )#define PRINT_FLAT(__) PRINT_WRAPPER( print_flat, __, "flat" )#define PRINT_DATA(__) PRINT_WRAPPER( print_data, __, "data" )

pc9objpriv.h
Private interface to the basic objects. Objects are represented as a pointer to a tagged union. Theat_ function forces execution of suspensions, but externally this action must be performed by callingtake ordrop.

#define PC9OBJPRIV_H#ifndef PC9OBJ_H  #include "pc9obj.h"#endiftypedef enum object_tag {  INVALID, INTEGER, LIST, SUSPENSION, PARSER, OPERATOR, SYMBOL, STRING, VOID,} tag;union uobject { tag t;       struct { tag t; int i;                                } Int;       struct { tag t; object a, b;                          } List;       struct { tag t; object v; fSuspension *f;             } Suspension;       struct { tag t; object v; fParser *f;                 } Parser;       struct { tag t; object v; fOperator *f;               } Operator;       struct { tag t; int symbol; char *pname; object data; } Symbol;       struct { tag t; char *string; int disposable;         } String;       struct { tag t; object next;                          } Header;       struct { tag t; void *v;                              } Void;};object new_( object a );#define OBJECT(...) new_( (union uobject[]){{ __VA_ARGS__ }} )object at_( object a );object fill_string( char **s, list o );int obj_main( void );

pc9obj.c
Implementation of basic objects. Objects are allocated as two structs side by side with the hidden left object used as an allocation record. The allocation records form a singly linked list which is traversed during a sweep of the garbage collector.x_ andxs_ are the famous lispcar andcdr functions, but I like the haskell naming convention so I used those; butx is too useful as a local variable name so they got underscores appended.

#include <stdio.h>#include "pc9objpriv.h"static void mark_objects( list a );static int sweep_objects( list *po );object T_ = (union uobject[]){{ .Symbol = { SYMBOL, T, "T" } }},       NIL_ = (union uobject[]){{ .t = INVALID }};static list global_roots = NULL;static list allocation_list = NULL;objectnew_( object a ){  object p = calloc( 2, sizeof *p );  return  p  ? p[0] = (union uobject){ .Header = { 0, allocation_list } },               allocation_list = p,               p[1] = *a,               &p[1]             : 0;}intvalid( object a ){  switch( a  ? a->t  : 0 ){  default:    return 0;  case INTEGER:  case LIST:  case SUSPENSION:  case PARSER:  case OPERATOR:  case SYMBOL:  case STRING:    return 1;  }}objectInt( int i ){  return  OBJECT( .Int = { INTEGER, i } );}listone( object a ){  return  cons( a, NIL_ );}listcons( object a, object b ){  return  OBJECT( .List = { LIST, a, b } );}objectSuspension( object v, fSuspension *f ){  return  OBJECT( .Suspension = { SUSPENSION, v, f } );}parserParser( object v, fParser *f ){  return  OBJECT( .Parser = { PARSER, v, f } );}operOperator( object v, fOperator *f ){  return  OBJECT( .Operator = { OPERATOR, v, f } );}objectString( char *s, int disposable ){  return  OBJECT( .String = { STRING, s, disposable } );}objectSymbol_( int sym, char *pname ){  return  OBJECT( .Symbol = { SYMBOL, sym, pname } );}objectVoid( void *v ){  return  OBJECT( .Void = { VOID, v } );}voidadd_global_root( object a ){  global_roots = cons( a, global_roots );}intgarbage_collect( object local_roots ){  mark_objects( local_roots );  mark_objects( global_roots );  return  sweep_objects( &allocation_list );}static tag *mark( object a ){  return  &a[-1].Header.t;}static voidmark_objects( list a ){  if(  !valid(a) || *mark( a )  ) return;  *mark( a ) = 1;  switch(  a->t  ){  case LIST:       mark_objects( a->List.a );                    mark_objects( a->List.b );       break;  case PARSER:     mark_objects( a->Parser.v );     break;  case OPERATOR:   mark_objects( a->Operator.v );   break;  case SYMBOL:     mark_objects( a->Symbol.data );  break;  case SUSPENSION: mark_objects( a->Suspension.v ); break;  }}static intsweep_objects( list *po ){  int count = 0;  while(  *po  )    if(  (*po)->t  ){      (*po)->t = 0;      po = &(*po)->Header.next;    } else {      object z = *po;      *po = (*po)->Header.next;      if(  z[1].t == STRING && z[1].String.disposable  )        free( z[1].String.string );      free( z );      ++count;    }  return  count;}objectat_( object a ){  return  valid( a ) && a->t == SUSPENSION  ? at_( a->Suspension.f( a->Suspension.v ) )  : a;}objectpx_( object v ){  list a = v;  *a = *at_( a );  return  x_( a );}objectx_( list a ){  return  valid( a )  ?              a->t == LIST        ? a->List.a             :              a->t == SUSPENSION  ? Suspension( a, px_ )  : NIL_          : NIL_;}objectpxs_( object v ){  list a = v;  *a = *at_( a );  return  xs_( a );}objectxs_( list a ){  return  valid( a )  ?              a->t == LIST        ? a->List.b              :              a->t == SUSPENSION  ? Suspension( a, pxs_ )  : NIL_          : NIL_;}listtake( int n, list o ){  if(  n == 0  ) return NIL_;  *o = *at_( o );  return  valid( o )  ? cons( x_( o ), take( n-1, xs_( o ) ) )  : NIL_;}listdrop( int n, list o ){  if(  n == 0  ) return o;  *o = *at_( o );  return  valid( o )  ? drop( n-1, xs_( o ) )  : NIL_;}listpchars_from_string( object v ){  char *p = v->String.string;  return  *p  ?  cons( Int( *p ), Suspension( String( p+1, 0 ), pchars_from_string ) )  : Symbol(EOF);}listchars_from_string( char *p ){  return  p  ?  Suspension( String( p, 0 ), pchars_from_string )  : NIL_;}listpchars_from_file( object v ){  FILE *f = v->Void.v;  int c = fgetc( f );  return  c != EOF  ? cons( Int( c ), Suspension( v, pchars_from_file ) )  : Symbol(EOF);}listchars_from_file( FILE *f ){  return  f  ? Suspension( Void( f ), pchars_from_file ) : NIL_;}static intcount_ints( list o ){  return  !o               ? 0 :          o->t == SUSPENSION ? *o = *at_( o ), count_ints( o ) :          o->t == INTEGER  ? 1 :          o->t == LIST     ? count_ints( o->List.a ) + count_ints( o->List.b ) :          0;}objectfill_string( char **s, list o ){  return  !o    ? NULL :          o->t == INTEGER  ? *(*s)++ = o->Int.i, NULL :          o->t == LIST     ? fill_string( s, o->List.a ), fill_string( s, o->List.b ) :          NULL;}objectstring_from_chars( list o ){  char *s = calloc( count_ints( o ) + 1, 1 );  object z = String( s, 1 );  return  fill_string( &s, o ), z;}voidprint( object o ){  if(  !o  ){ printf( "() " ); return; }  switch( o->t ){  case INTEGER:    printf( "%d ", o->Int.i );            break;  case LIST:       printf( "(" );                     print( o->List.a );                     print( o->List.b );                   printf( ") " );                       break;  case SUSPENSION: printf( "... " );              break;  case PARSER:     printf( "Parser " );                  break;  case OPERATOR:   printf( "Oper " );                break;  case STRING:     printf( "\"%s\"", o->String.string ); break;  case SYMBOL:     printf( "%s ", o->Symbol.pname );     break;  case INVALID:    printf( "_ " );                       break;  default:         printf( "INVALID " );                 break;  }}voidprint_listn( list a ){  switch(  a  ? a->t  : 0  ){  default: print( a ); return;  case LIST: print_list( x_( a ) ), print_listn( xs_( a ) ); return;  }}voidprint_list( list a ){  switch(  a  ? a->t  : 0  ){  default: print( a ); return;  case LIST: printf( "(" ), print_list( x_( a ) ), print_listn( xs_( a ) ), printf( ")" ); return;  }}voidprint_flat( list a ){  if(  !a  ) return;  if(  a->t != LIST  ){ print( a ); return; }  print_flat( a->List.a );  print_flat( a->List.b );}voidprint_data( list a ){  if(  !a  ) return;  switch(  a->t  ){  case LIST:  print_data( a->List.a), print_data( a->List.b );  break;  case STRING: printf( "%s", a->String.string ); break;  case SYMBOL: print_data( a->Symbol.data );  break;  }}inttest_basics(){  list ch = chars_from_string( "abcdef" );  PRINT( ch );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( take( 1, ch ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( x_( ch ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( x_( xs_( ch ) ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( take( 1, x_( xs_( ch ) ) ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( take( 5, ch ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( ch );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( take( 6, ch ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( take( 1, ch ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( take( 2, ch ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( take( 2, ch ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( take( 2, ch ) );  PRINT( Int( garbage_collect( ch ) ) );  PRINT( take( 2, ch ) );  PRINT( Int( garbage_collect( ch ) ) );  return 0;}int obj_main(){ return test_basics(); }

pc9fp.h
Interface to the "functional programming" functions. These have all been tweaked to deal nicely (?) with suspensions and force only enough execution to make progress on a computation.

#define PC9FP_H#ifndef PC9OBJ_H  #include "pc9obj.h"#endifboolean eq( object a, object b );list env( list tail, int n, ... );object assoc( object a, list b );list copy( list a );list append( list a, list b );object apply( oper f, object o );list map( oper f, list o );list join( list o );object collapse( fBinOper *f, list o );object reduce( fBinOper *f, int n, object *po );

pc9fp.c
Implementation of the "functional programming" functions.

#include <stdarg.h>#include <string.h>#include "pc9fp.h"#include "pc9objpriv.h"booleaneq( object a, object b ){  return (           !valid( a ) && !valid( b )  ? 1  :           !valid( a ) || !valid( b )  ? 0  :           a->t != b->t                ? 0  :           a->t == SYMBOL              ? a->Symbol.symbol == b->Symbol.symbol  :           !memcmp( a, b, sizeof *a )  ? 1  : 0         )  ? T_  : NIL_;}listcopy( list a ){  return  !valid( a )   ? NIL_                                       :          a->t == LIST  ? cons( copy( x_( a ) ), copy( xs_( a ) ) )  :                      a;}listenv( list tail, int n, ... ){  va_list v;  va_start( v, n );  list r = tail;  while( n-- ){    object a = va_arg( v, object );    object b = va_arg( v, object );    r = cons( cons( a, b ), r );  }  va_end( v );  return  r;}objectassoc( object a, list b ){  return  !valid( b )                      ? NIL_            :          valid( eq( a, x_( x_( b ) ) ) )  ? xs_( x_( b ) )  :                                             assoc( a, xs_( b ) );}static listpappend( object v ){  list a = assoc( Symbol(A), v );  list b = assoc( Symbol(B), v );  *a = *at_( a );  return  append( a, b );}listappend( list a, list b ){  return  !valid( a )         ? b                                                               :          a->t == SUSPENSION  ? Suspension( env( 0, 2, Symbol(A), a, Symbol(B), b ), pappend )  :                                cons( x_( a ), append( xs_( a ), b ) );}static objectpapply( object v ){  oper f = assoc( Symbol(F), v );  object o = assoc( Symbol(X), v );  *o = *at_( o );  return  valid( o )  ? f->Operator.f( f->Operator.v, o )  : NIL_;}objectapply( oper f, object o ){  return  f->t == OPERATOR  ?               valid( o )  ?                  o->t == SUSPENSION  ? Suspension( env( 0, 2, Symbol(F), f, Symbol(X), o ), papply )                        : f->Operator.f( f->Operator.v, o )                  : f->Operator.f( f->Operator.v, o )    // for using( maybe(), ... )              : NIL_;  //return  f->t == OPERATOR  ? f->Operator.f( f->Operator.v, o )  : NIL_;}static listpmap( object v ){  oper f = assoc( Symbol(F), v );  list o = assoc( Symbol(X), v );  *o = *at_( o );  return  valid( o )  ? cons( apply( f, x_( o ) ), map( f, xs_( o ) ) ) : NIL_;}listmap( oper f, list o ){  return  valid( o )  ?              o->t == SUSPENSION  ? Suspension( env( 0, 2, Symbol(F), f, Symbol(X), o ), pmap ) :              cons( apply( f, x_( o ) ),                    Suspension( env( 0, 2, Symbol(F), f, Symbol(X), xs_( o ) ), pmap ) )                      : NIL_;  //return  valid( o )  ? cons( apply( f, x_( o ) ), map( f, xs_( o ) ) )  : NIL_;}static listpjoin( object v ){  list o = assoc( Symbol(X), v );  *o = *at_( o );  return  append( x_( take( 1, o ) ), join( xs_( o ) ) );}listjoin( list o ){  return  valid( o )  ?               o->t == SUSPENSION  ? Suspension( env( 0, 1, Symbol(X), o ), pjoin )  :                  append( x_( o ), Suspension( env( 0, 1, Symbol(X), xs_( o ) ), pjoin ) )                      : NIL_;  //return  valid( o )  ? append( x_( o ), join( xs_( o ) ) )  : NIL_;}static objectdo_collapse( fBinOper *f, object a, object b ){  return  valid( b )  ? f( a, b )  : a;}objectcollapse( fBinOper *f, list o ){  return  valid( o )  ?              o->t == LIST  ? do_collapse( f, collapse( f, x_( o ) ), collapse( f, xs_( o ) ) )                            : o                      : NIL_;}objectreduce( fBinOper *f, int n, object *po ){  return  n==1  ? *po  : f( *po, reduce( f, n-1, po+1 ) );}

pc9par.h
Interface to the Parser Combinators. Constructing parsers for single charactersalphadigitsat or combining parsers togetherseqplus. Construct a parser using aregex.

#define PC9PAR_H#ifndef PC9FP_H  #include "pc9fp.h"#endif#include "ppnarg.h"enum parser_symbols {  VALUE = SYM1, PRED, P, PP, NN, Q, R, FF, XX, AA, ID, USE, ATOM,  SYM2};list parse( parser p, list input );parser result( object a );parser zero( void );parser item( void );parser bind( parser p, oper f );parser plus( parser p, parser q );#define PLUS(...)  reduce( plus, PP_NARG(__VA_ARGS__), (object[]){ __VA_ARGS__ } )parser sat( predicate pred );parser alpha( void );parser digit( void );parser lit( object a );parser chr( int c );parser str( char *s );parser anyof( char *s );parser noneof( char *s );parser seq( parser p, parser q );#define SEQ(...)  reduce( seq, PP_NARG(__VA_ARGS__), (object[]){ __VA_ARGS__ } )parser xthen( parser p, parser q );parser thenx( parser p, parser q );parser into( parser p, object id, parser q );parser maybe( parser p );parser forward( void );parser many( parser p );parser some( parser p );parser trim( parser p );parser using( parser p, fOperator *f );parser regex( char *re );int par_main( void );

pc9par.c
Implementation of the Parser Combinators. Includes 3 "internal DSL" examples with theregex() function, andpprintf() andpscanf() functions.

#include <ctype.h>#include <stdarg.h>#include <string.h>#include "pc9par.h"#include "pc9objpriv.h"listparse( parser p, list input ){  return  valid( p ) && p->t == PARSER && valid( input ) ? p->Parser.f( p->Parser.v, input )  : NIL_;}static listpresult( object v, list input ){  return  one( cons( assoc( Symbol(VALUE), v ), input ) );}parserresult( object a ){  return  Parser( env( 0, 1, Symbol(VALUE), a ), presult );}static listpzero( object v, list input ){  return  NIL_;}parserzero( void ){  return  Parser( 0, pzero );}static listpitem( object v, list input ){  drop( 1, input );  return  valid( input ) ? one( cons( x_( input ), xs_( input ) ) ) : NIL_;  //return  valid( input ) ? one( cons( x_( take( 1, input ) ), xs_( input ) ) )  : NIL_;  //strict  //return  valid( input ) ? one( cons( x_( input ), xs_( input ) ) )  : NIL_;             //lazy}parseritem( void ){  return  Parser( 0, pitem );}static listpbind( object v, list input ){  parser p = assoc( Symbol(P), v );  oper f = assoc( Symbol(FF), v );  list r = parse( p, input );  return  valid( r )  ? join( map( Operator( valid( f->Operator.v ) ?                                               append( copy( f->Operator.v ), v )  : v,                                             f->Operator.f ),                                   r ) )                      : NIL_;}parserbind( parser p, oper f ){  return  Parser( env( 0, 2, Symbol(P), p, Symbol(FF), f ), pbind );}static listbplus( object v ){  list r = assoc( Symbol(R), v );  object qq = assoc( Symbol(Q), v );  *r = *at_( r );  return  valid( r )  ? append( r, qq ) : qq;}static listcplus( object v ){  parser q = assoc( Symbol(Q), v );  list input = assoc( Symbol(X), v );  return  parse( q, input );}static listpplus( object v, list input ){  parser p = assoc( Symbol(P), v );  parser q = assoc( Symbol(Q), v );  list r = parse( p, input );  object qq = Suspension( env( 0, 2, Symbol(Q), q, Symbol(X), input ), cplus );  return  valid( r )  ?               r->t == SUSPENSION  ? Suspension( env( 0, 2, Symbol(R), r, Symbol(Q), qq ), bplus )                                  : append( r, qq )                      : qq;}parserplus( parser p, parser q ){  if(  !q  ) return  p;  return  Parser( env( 0, 2, Symbol(P), p, Symbol(Q), q ), pplus );}static listpsat( object v, list input ){  predicate pred = assoc( Symbol(PRED), v );  object r = apply( pred, x_( input ) );  return  valid( r )  ? one( cons( x_( input ), xs_( input ) ) )  : NIL_;}parsersat( predicate pred ){  return  bind( item(), Operator( env( 0, 1, Symbol(PRED), pred ), psat ) );}static booleanpalpha( object v, object o ){  return  isalpha( o->Int.i )  ? T_  : NIL_;}parseralpha( void ){  return  sat( Operator( 0, palpha ) );}static booleanpdigit( object v, object o ){  return  isdigit( o->Int.i )  ? T_  : NIL_;}parserdigit( void ){  return  sat( Operator( 0, pdigit ) );}static booleanplit( object v, object o ){  object a = assoc( Symbol(X), v );  return  eq( a, o );}parserlit( object a ){  return  sat( Operator( env( 0, 1, Symbol(X), a ), plit ) );}parserchr( int c ){  return  lit( Int( c ) );}parserstr( char *s ){  return  *s  ? seq( chr( *s ), str( s+1 ) )  : result(0);}parseranyof( char *s ){  return  *s  ? plus( chr( *s ), anyof( s+1 ) )  : zero();}static listpnone( object v, list input ){  parser p = assoc( Symbol(NN), v );  object r = parse( p, input );  *r = *at_( r );  return  valid( r )  ? NIL_  : pitem( 0, input );}parsernoneof( char *s ){  return  Parser( env( 0, 1, Symbol(NN), anyof( s ) ), pnone );}static listpprepend( object v, list o ){  object a = assoc( Symbol(AA), v );  return  valid( a )  ? cons( cons( a, x_( o ) ), xs_( o ) )  : o;}static listprepend( list a, list b ){  return  map( Operator( env( 0, 1, Symbol(AA), a ), pprepend ), b );}static listpseq( object v, list output ){  parser q = assoc( Symbol(Q), v );  return  prepend( x_( output ), parse( q, xs_( output ) ) );}parserseq( parser p, parser q ){  if(  !q  ) return  p;  return  bind( p, Operator( env( 0, 1, Symbol(Q), q ), pseq ) );}static listpxthen( object v, list o ){  return  one( cons( xs_( x_( o ) ), xs_( o ) ) );}parserxthen( parser p, parser q ){  return  bind( seq( p, q ), Operator( 0, pxthen ) );}static listpthenx( object v, list o ){  return  one( cons( x_( x_( o ) ), xs_( o ) ) );}parserthenx( parser p, parser q ){  return  bind( seq( p, q ), Operator( 0, pthenx ) );}static listpinto( object v, list o ){  object id = assoc( Symbol(ID), v );  parser q = assoc( Symbol(Q), v );  return  parse( Parser( env( q->Parser.v, 1, id, x_( o ) ), q->Parser.f ), xs_( o ) );}parserinto( parser p, object id, parser q ){  return  bind( p, Operator( env( 0, 2, Symbol(ID), id, Symbol(Q), q ), pinto ) );}parsermaybe( parser p ){  return  plus( p, result(0) );}parserforward( void ){  return  Parser( 0, 0 );}parsermany( parser p ){  parser q = forward();  parser r = maybe( seq( p, q ) );  *q = *r;  return  r;}parsersome( parser p ){  return  seq( p, many( p ) );}static listptrim( object v, list input ){  parser p = assoc( Symbol(PP), v );  list r = parse( p, input );  return  valid( r )  ? one( x_( take( 1, r ) ) )  : r;}parsertrim( parser p ){  return  Parser( env( 0, 1, Symbol(PP), p ), ptrim );}static listpusing( object v, list o ){  oper f = assoc( Symbol(USE), v );  return  one( cons( apply( f, x_( o ) ), xs_( o ) ) );}parserusing( parser p, fOperator *f ){  return  bind( p, Operator( env( 0, 1, Symbol(USE), Operator( 0, f ) ), pusing ) );}static parserdo_meta( parser a, object o ){  switch( o->Int.i ){  case '*':  return  many( a ); break;  case '+':  return  some( a ); break;  case '?':  return  maybe( a ); break;  } return  a;}static parseron_meta( object v, object o ){  parser atom = assoc( Symbol(ATOM), v );  return  valid( o ) ? do_meta( atom, o )  : atom;}static parser  on_dot( object v, object o ){ return  item(); }static parser  on_chr( object v, object o ){ return  lit( o ); }static parser  on_term( object v, object o ){ return  collapse( seq, o ); }static parser  on_expr( object v, object o ){ return  collapse( plus, o ); }#define META     "*+?"#define SPECIAL  META ".|()"parserregex( char *re ){  static parser p;  if(  !p  ){    parser dot   = using( chr('.'), on_dot );    parser meta  = anyof( META );    parser escape = xthen( chr('\\'), anyof( SPECIAL "\\" ) );    parser chr_  = using( plus( escape,  noneof( SPECIAL ) ), on_chr );    parser expr_ = forward();    parser atom  = PLUS( dot,                         xthen( chr('('), thenx( expr_, chr(')') ) ),                         chr_ );    parser factor = into( atom, Symbol(ATOM), using( maybe( meta ), on_meta ) );    parser term   = using( some( factor ), on_term );    parser expr   = using( seq( term, many( xthen( chr('|'), term ) ) ), on_expr );    *expr_ = *expr;    p = trim( expr );    add_global_root( p );  }  list r = parse( p, chars_from_string( re ) );  return  valid( r )  ? ( x_( x_( r ) ) )  : r;}parservusing( parser p, object v, fOperator *f ){  return  bind( p, Operator( env( 0, 1, Symbol(USE), Operator( v, f ) ), pusing ) );}object sum( object a, object b ){ return  Int( a->Int.i + b->Int.i ); }boolean nz( object v, object o ){ return  o->Int.i ? T_ : NIL_; }static object p_char( object v, list o ){  va_list *p = (void *)v; return  putchar(va_arg( *p, int )), Int(1);}static object p_string( object v, list o ){  va_list *p = (void *)v;  char *s = va_arg( *p, char* );  return  fputs( s, stdout ), Int(strlen( s ));}static object p_lit( object v, list o ){  return  putchar( o->Int.i ), Int(1);}static object on_fmt( object v, list o ){ return  collapse( sum, o ); }intpprintf( char const *fmt, ... ){  if(  !fmt  ) return  0;  static va_list v;  va_start( v, fmt );  static parser p;  if(  !p  ){    parser directive = PLUS( using( chr('%'), p_lit ),                             vusing( chr('c'), (void *)&v, p_char ),                             vusing( chr('s'), (void *)&v, p_string ) );    parser term = PLUS( xthen( chr('%'), directive ),                        using( sat( Operator( 0, nz ) ), p_lit ) );    parser format = many( term );    p = using( format, on_fmt );    add_global_root( p );  }  object r = parse( p, chars_from_string( (char*)fmt ) );  drop( 1, r );  va_end( v );  return  x_( x_( r ) )->Int.i;}static object  convert_char( object v, list o ){  va_list *p = (void *)v;  char *cp = va_arg( *p, char* );  *cp = o->Int.i;  return  Int(1);}static object  convert_string( object v, list o ){  va_list *p = (void *)v;  char *sp = va_arg( *p, char* );  fill_string( &sp, o );  return  Int(1);}static parser  on_char( object v, list o ){  return  vusing( item(), v, convert_char );}static parser  on_string( object v, list o ){  return  vusing( xthen( many( anyof( " \t\n" ) ),  many( noneof( " \t\n" ) ) ), v, convert_string );}static object  r_zero( object v, list o ){ return  Int(0); }static parser  pass( parser p ){ return  using( p, r_zero ); }static parser  on_space( object v, list o ){ return  valid( o )  ? pass( many( anyof( " \t\n" ) ) )  : o; }static parser  on_percent( object v, list o ){ return  pass( chr('%') ); }static parser  on_lit( object v, list o ){ return  pass( lit( o ) ); }static object  sum_up( object v, list o ){ return  collapse( sum, o ); }static parser  on_terms( object v, list o ){ return  using( collapse( seq, o ), sum_up ); }intpscanf( char const *fmt, ... ){  if(  !fmt  ) return  0;  static va_list v;  va_start( v, fmt );  static parser p;  if(  !p  ){    parser space = using( many( anyof( " \t\n" ) ), on_space );    parser directive = PLUS( using( chr('%'), on_percent ),                             vusing( chr('c'), (void *)&v, on_char ),                             vusing( chr('s'), (void *)&v, on_string ) );    parser term  = PLUS( xthen( chr('%'), directive ),                         using( sat( Operator( 0, nz ) ), on_lit ) );    parser format = many( seq( space, term ) );    p = using( format, on_terms );    add_global_root( p );  }  list fp = parse( p, chars_from_string( (char*)fmt ) );  drop( 1, fp );  parser f = x_( x_( fp ) );  if(  !valid( f )  ) return 0;  list r = parse( f, chars_from_file( stdin ) );  drop( 1, r );  va_end( v );  return  valid( r ) ? x_( x_( r ) )->Int.i : 0;}int test_pscanf(){  char c;  PRINT( Int( pscanf( "" ) ) );  PRINT( Int( pscanf( "abc" ) ) );  PRINT( Int( pscanf( "  %c", &c ) ) );  PRINT( string_from_chars( Int( c ) ) );  char buf[100];  PRINT( Int( pscanf( "%s", buf ) ) );  PRINT( String( buf, 0 ) );  return 0;}int test_pprintf(){  PRINT( Int( pprintf( "%% abc %c %s\n", 'x', "123" ) ) );  return  0;}int test_regex(){  parser a;  PRINT( a = regex( "\\." ) );  PRINT( parse( a, chars_from_string( "a" ) ) );  PRINT( parse( a, chars_from_string( "." ) ) );  PRINT( parse( a, chars_from_string( "\\." ) ) );  parser b;  PRINT( b = regex( "\\\\\\." ) );  PRINT( parse( b, chars_from_string( "\\." ) ) );  PRINT( take( 3, parse( b, chars_from_string( "\\." ) ) ) );  parser r;  PRINT( r = regex( "a?b+(c).|def" ) );  PRINT( parse( r, chars_from_string( "abc" ) ) );  PRINT( parse( r, chars_from_string( "abbcc" ) ) );  PRINT( Int( garbage_collect( r ) ) );  list s;  PRINT( s = parse( r, chars_from_string( "def" ) ) );  PRINT( take( 3, s ) );  PRINT( parse( r, chars_from_string( "deff" ) ) );  PRINT( parse( r, chars_from_string( "adef" ) ) );  PRINT( parse( r, chars_from_string( "bcdef" ) ) );  PRINT( Int( garbage_collect( cons( r, s ) ) ) );  parser t;  PRINT( t = regex( "ac|bd" ) );  PRINT( parse( t, chars_from_string( "ac" ) ) );  PRINT( take( 1, parse( t, chars_from_string( "bd" ) ) ) );  PRINT( Int( garbage_collect( t ) ) );  parser u;  PRINT( u = regex( "ab|cd|ef" ) );  PRINT( parse( u, chars_from_string( "ab" ) ) );  PRINT( parse( u, chars_from_string( "cd" ) ) );  PRINT( take( 1, parse( u, chars_from_string( "cd" ) ) ) );  PRINT( parse( u, chars_from_string( "ef" ) ) );  PRINT( take( 1, parse( u, chars_from_string( "ef" ) ) ) );  PRINT( Int( garbage_collect( u ) ) );  parser v;  PRINT( v = regex( "ab+(c).|def" ) );  PRINT( parse( v, chars_from_string( "def" ) ) );  PRINT( take( 2, parse( v, chars_from_string( "def" ) ) ) );  parser w;  PRINT( w = regex( "a?b|c" ) );  PRINT( parse( w, chars_from_string( "a" ) ) );  PRINT( parse( w, chars_from_string( "b" ) ) );  PRINT( take( 3, parse( w, chars_from_string( "c" ) ) ) );  PRINT( Int( garbage_collect( w ) ) );  return 0;}int test_env(){  object e = env( 0, 2, Symbol(F), Int(2), Symbol(X), Int(4) );  PRINT( e );  PRINT( assoc( Symbol(F), e ) );  PRINT( assoc( Symbol(X), e ) );  return 0;}object b( object v, object o ){  return  one( cons( Int( - x_( o )->Int.i ), xs_( o ) ) );}int test_parsers(){  list ch = chars_from_string( "a b c 1 2 3 d e f 4 5 6" );  {    parser p = result( Int(42) );    PRINT( parse( p, ch ) );    PRINT( Int( garbage_collect( ch ) ) );  }  {    parser q = zero();    PRINT( parse( q, ch ) );    PRINT( Int( garbage_collect( ch ) ) );  }  {    parser r = item();    PRINT( r );    PRINT( parse( r, ch ) );    PRINT( x_( parse( r, ch ) ) );    PRINT( take( 1, x_( parse( r, ch ) ) ) );    PRINT( x_( take( 1, x_( parse( r, ch ) ) ) ) );    PRINT( take( 1, x_( take( 1, x_( parse( r, ch ) ) ) ) ) );    PRINT( parse( bind( r, Operator( 0, b ) ), ch ) );    PRINT( Int( garbage_collect( cons( ch, r ) ) ) );  }  {    parser s = plus( item(), alpha() );    PRINT( s );    PRINT( parse( s, ch ) );    PRINT( take( 2, parse( s, ch ) ) );    PRINT( Int( garbage_collect( ch ) ) );  }  {    parser t = lit( Int( 'a' ) );    PRINT( parse( t, ch ) );    parser u = str( "a b c" );    PRINT( parse( u, ch ) );    PRINT( Int( garbage_collect( cons( ch, cons( t, u ) ) ) ) );  }  return 0;}int par_main(){  return       obj_main(),      test_env(), test_parsers(),      test_regex(),          test_pprintf(),          test_pscanf(),          0;}

pc9tok.h
Interface to the example tokenizer forcirca 1975 pre-K&R C. Perhaps too much macro stuff? All the keywords and operators and punctuation which can be matched as exact strings are defined in an X-macro table which associates each string with an identifier. The identifiers are all defined in the enum for use with the symbol typed objects.

#define PC9TOK_H#ifndef PC9PAR_H  #include "pc9par.h"#endif#define Each_Symbolic(_) \  _("int", k_int) _("char", k_char) _("float", k_float) _("double", k_double) _("struct", k_struct) \  _("auto", k_auto) _("extern", k_extern) _("register", k_register) _("static", k_static) \  _("goto", k_goto) _("return", k_return) _("sizeof", k_sizeof) \  _("break", k_break) _("continue", k_continue) \  _("if", k_if) _("else", k_else) \  _("for", k_for) _("do", k_do) _("while", k_while) \  _("switch", k_switch) _("case", k_case) _("default", k_default) \  /*_("entry", k_entry)*/ \  _("*", o_star) _("++", o_plusplus) _("+", o_plus) _(".", o_dot) \  _("->", o_arrow) _("--", o_minusminus) _("-", o_minus) _("!=", o_ne) _("!", o_bang) _("~", o_tilde) \  _("&&", o_ampamp) _("&", o_amp) _("==", o_equalequal) _("=", o_equal) \  _("^", o_caret) _("||", o_pipepipe) _("|", o_pipe) \  _("/", o_slant) _("%", o_percent) \  _("<<", o_ltlt) _("<=", o_le) _("<", o_lt) _(">>", o_gtgt) _(">=", o_ge) _(">", o_gt) \  _("=+", o_eplus) _("=-", o_eminus) _("=*", o_estar) _("=/", o_eslant) _("=%", o_epercent) \  _("=>>", o_egtgt) _("=<<", o_eltlt) _("=&", o_eamp) _("=^", o_ecaret) _("=|", o_epipe) \  _("(", lparen) _(")", rparen) _(",", comma) _(";", semi) _(":", colon) _("?", quest) \  _("{", lbrace) _("}", rbrace) _("[", lbrack) _("]", rbrack) \//End Symbolic#define Enum_name(x,y) y ,enum token_symbols {  t_id = SYM2,  c_int, c_float, c_char, c_string,  Each_Symbolic( Enum_name )  SYM3};list tokens_from_chars( object v );int tok_main( void );

pc9tok.c
Implementation of the tokenizer for pre-K&R C. All of the identifiers from the table in the header file are converted into parsers which match the associated string and yield the symbol as output. The next layer can easily match against these symbols. The symbol type object also has an extradata pointer to hold extra stuff. The token functions pack the actual input string and any preliminary whitespace in this pointer in the symbol object. So this data isn't lost, but it's hidden from the parser layer which just deals with token symbols.

#include "pc9tok.h"#include "pc9objpriv.h"static object  on_spaces( object v, list o ){ return  string_from_chars( o ); }static object  on_integer( object v, list o ){ return  cons( Symbol(c_int), string_from_chars( o ) ); }static object  on_floating( object v, list o ){ return  cons( Symbol(c_float), string_from_chars( o ) ); }static object  on_character( object v, list o ){ return  cons( Symbol(c_char), string_from_chars( o ) ); }static object  on_string( object v, list o ){ return  cons( Symbol(c_string), string_from_chars( o ) ); }static object  on_identifier( object s, list o ){ return  cons( Symbol(t_id), string_from_chars( o ) ); }#define On_Symbolic(a,b) \  static object  on_##b( object v, list o ){ return  cons( Symbol(b), string_from_chars( o ) ); }Each_Symbolic( On_Symbolic )static parsertoken_parser( void ){  parser space      = using( many( anyof( " \t\n" ) ), on_spaces );  parser alpha_     = plus( alpha(), chr('_') );  parser integer    = using( some( digit() ), on_integer );  parser floating   = using( SEQ( plus( SEQ( some( digit() ), chr('.'), many( digit() ) ),                                 seq( chr('.'), some( digit() ) ) ),                                 maybe( SEQ( anyof("eE"), maybe( anyof("+-") ), some( digit() ) ) ) ),                             on_floating );  parser escape     = seq( chr('\\'),                           plus( seq( digit(), maybe( seq( digit(), maybe( digit() ) ) ) ),                                 anyof( "'\"bnrt\\" ) ) );  parser char_      = plus( escape, noneof( "'\n" ) );  parser schar_     = plus( escape, noneof( "\"\n" ) );  parser character  = using( SEQ( chr('\''), char_, chr('\'') ), on_character );  parser string     = using( SEQ( chr('"'), many( schar_ ), chr('"') ), on_string );  parser constant   = PLUS( floating, integer, character, string );# define Handle_Symbolic(a,b)  using( str( a ), on_##b ),  parser symbolic   = PLUS( Each_Symbolic( Handle_Symbolic ) zero() );  parser identifier = using( seq( alpha_, many( plus( alpha_, digit() ) ) ), on_identifier );  return  seq( space, PLUS( constant, symbolic, identifier ) );}static object  on_token( object v, list o ){  object space = x_( o );  object symbol = x_( xs_( o ) );  object string = xs_( xs_( o ) );  return  symbol->Symbol.data = cons( space, string ),  symbol;  return  cons( symbol, cons( space, string ) );}listptokens_from_chars( object s ){  if(  !valid( s )  ) return  Symbol(EOF);  static parser p;  if(  !p  ){    p = using( token_parser(), on_token );    add_global_root( p );  }  list r = parse( p, s );  take( 1, r );  r = x_( r );  return  cons( x_( r ), Suspension( xs_( r ), ptokens_from_chars ) );}listtokens_from_chars( object s ){  return  valid( s )  ? Suspension( s, ptokens_from_chars )  : Symbol(EOF);}int test_tokens(){  list tokens = tokens_from_chars( chars_from_string( "'x' auto \"abc\" 12 ;*++'\\42' '\\n' 123 if" ) );  PRINT( tokens );  PRINT( take( 1, tokens ) );  PRINT( take( 2, tokens ) );  PRINT( drop( 1, tokens ) );  PRINT( take( 2, drop( 1, tokens ) ) );  drop( 7, tokens );  PRINT( tokens );  PRINT( Int( garbage_collect( tokens ) ) );  return 0;}int tok_main(){  return          par_main(),          test_tokens(),          0;}

pc9syn.h
Interface to the Syntax Analyzer for pre-K&R C. Pretty simple this time, just extending the symbol ids and declaring the main parser function.

#define PC9SYN_H#ifndef PC9TOK_H  #include "pc9tok.h"#endifenum syntax_analysis_symbols {  func_def = SYM3,  data_def,  SYM4};list tree_from_tokens( object s );

pc9syn.c
Implementation of the Syntax Analyzer for pre-K&R C. All of the symbols from the tokenizer are converted into parsers which match those symbols and are named with an extra underscore appended. Soc_float is an enum,Symbol(c_float) is a symbol object, andc_float_ (with extra underscore) is a parser which matches that token symbol. So all the names in here with underscores, likecomma_semi_k_if_, are parsers which match against the tokens coming from the input list.

#include "pc9syn.h"#include "pc9objpriv.h"#define Extra_Symbols(_) \  _(t_id) _(c_int) _(c_float) _(c_char) _(c_string)#define Parser_for_symbolic_(a,b)  parser b##_ = lit( Symbol(b) );#define Parser_for_symbol_(b)      parser b##_ = lit( Symbol(b) );static object on_func_def( object v, list o ){   object s = Symbol(func_def); return  s->Symbol.data = o, s;  return  cons( Symbol(func_def), o ); }static object on_data_def( object v, list o ){  object s = Symbol(data_def); return  s->Symbol.data = o, s;}parserparser_for_grammar( void ){  Each_Symbolic( Parser_for_symbolic_ )  Extra_Symbols( Parser_for_symbol_ )  parser identifier = t_id_;  parser asgnop     = PLUS( o_equal_, o_eplus_, o_eminus_, o_estar_, o_eslant_, o_epercent_,                        o_egtgt_, o_eltlt_, o_eamp_, o_ecaret_, o_epipe_ );  parser constant   = PLUS( c_int_, c_float_, c_char_, c_string_ );  parser lvalue     = forward();  parser expression = forward();  *lvalue =      *PLUS(      identifier,      seq( o_star_, expression ),      //SEQ( primary, o_arrow_, identifier ),  // introduces a left-recursion indirectly      SEQ( lparen_, lvalue, rparen_ )      );  parser expression_list = seq( expression, many( seq( comma_, expression ) ) );  parser primary =      seq(       PLUS(         identifier,         constant,         SEQ( lparen_, expression, rparen_ ),         SEQ( lvalue, o_dot_, identifier )       ),       maybe( PLUS(         SEQ( lparen_, expression_list, rparen_ ),         SEQ( lbrack_, expression, rbrack_ ),         seq( o_arrow_, identifier )       ) )      );  *expression =      *seq(       PLUS(         primary,         seq( o_star_, expression ),         seq( o_amp_, expression ),         seq( o_minus_, expression ),         seq( o_bang_, expression ),         seq( o_tilde_, expression ),         seq( o_plusplus_, lvalue ),         seq( o_minusminus_, lvalue ),         seq( lvalue, o_plusplus_ ),         seq( lvalue, o_minusminus_ ),         seq( k_sizeof_, expression ),         SEQ( lvalue, asgnop, expression )       ),       maybe( PLUS(                    seq( PLUS( o_star_, o_slant_, o_percent_ ), expression ),                    seq( PLUS( o_plus_, o_minus_ ), expression ),                    seq( PLUS( o_ltlt_, o_gtgt_ ), expression ),                    seq( PLUS( o_lt_, o_le_, o_gt_, o_ge_ ), expression ),                    seq( PLUS( o_equalequal_, o_ne_ ), expression ),                    seq( o_amp_, expression ),                    seq( o_caret_, expression ),                    seq( o_pipe_, expression ),                    seq( o_ampamp_, expression ),                    seq( o_pipepipe_, expression ),                    SEQ( quest_, expression, colon_, expression ),                    seq( comma_, expression )       ) )      );  parser constant_expression = expression;  parser statement  = forward();  parser statement_list = many( statement );        *statement  =        *PLUS(              seq( expression, semi_ ),              SEQ( lbrace_, statement_list, rbrace_ ),              SEQ( k_if_, lparen_, expression, rparen_, statement ),              SEQ( k_if_, lparen_, expression, rparen_, statement, k_else_, statement ),              SEQ( k_do_, statement, k_while_, lparen_, expression, rparen_, semi_ ),              SEQ( k_while_, lparen_, expression, rparen_, statement ),              SEQ( k_for_, lparen_,                     maybe( expression ), semi_, maybe( expression ), semi_, maybe( expression ),                   rparen_, statement ),              SEQ( k_switch_, lparen_, expression, rparen_, statement ),              SEQ( k_case_, constant_expression, colon_, statement ),              SEQ( k_default_, colon_, statement ),              seq( k_break_, semi_ ),              seq( k_continue_, semi_ ),              seq( k_return_, semi_ ),              SEQ( k_return_, expression, semi_ ),              SEQ( k_goto_, expression, semi_ ),              SEQ( identifier, colon_, statement ),              semi_        );  parser constant_expression_list = seq( constant_expression, many( seq( comma_, constant_expression ) ) );  parser initializer = plus( constant, constant_expression_list );  parser type_specifier = forward();  parser declarator_list = forward();  parser type_declaration = SEQ( type_specifier, declarator_list, semi_ );  parser type_decl_list = some( type_declaration );  parser sc_specifier = PLUS( k_auto_, k_static_, k_extern_, k_register_ );    *type_specifier = *PLUS(                            k_int_, k_char_, k_float_, k_double_,                            SEQ( k_struct_, lbrace_, type_decl_list, rbrace_ ),                            SEQ( k_struct_, identifier, lbrace_, type_decl_list, rbrace_ ),                            SEQ( k_struct_, identifier )                           );  parser declarator = forward();    *declarator = *seq( PLUS(                          identifier,                          seq( o_star_, declarator ),                          SEQ( lparen_, declarator, rparen_ )                        ), maybe( PLUS(                                       seq( lparen_, rparen_ ),                                       SEQ( lbrack_, constant_expression, rbrack_ )                        ) )                      );    *declarator_list = *seq( declarator, many( seq( comma_, declarator ) ) );  parser decl_specifiers = PLUS( type_specifier, sc_specifier,                             seq( type_specifier, sc_specifier ),                             seq( sc_specifier, type_specifier ) );  parser declaration = seq( decl_specifiers, maybe( declarator_list ) );  parser declaration_list = seq( declaration, many( seq( comma_, declaration ) ) );  parser init_declarator = seq( declarator, maybe( initializer ) );  parser init_declarator_list = seq( init_declarator, many( seq( comma_, init_declarator ) ) );  parser data_def = using( SEQ( maybe( k_extern_ ),                                maybe( type_specifier ),                                maybe( init_declarator_list ), semi_ ),                           on_data_def );  parser parameter_list = maybe( seq( expression, many( seq( comma_, expression ) ) ) );  parser function_declarator = SEQ( declarator, lparen_, parameter_list, rparen_ );  parser function_statement = SEQ( lbrace_, maybe( declaration_list ), many( statement ), rbrace_ );  parser function_body = seq( maybe( type_decl_list ), function_statement );  parser function_def = using( SEQ( maybe( type_specifier ), function_declarator, function_body ),                               on_func_def );  parser external_def = plus( function_def, data_def );  parser program = some( external_def );  return  program;}listtree_from_tokens( object s ){  if(  !s  ) return  NIL_;  static parser p;  if(  !p  ){    p = parser_for_grammar();    add_global_root( p );  }  return  parse( p, s );}int test_syntax(){  char *source ="\n""int i,j,k 5;\n""float d 3.4;\n""int max(a, b, c)\n""int a, b, c;\n""{\n""      int m;\n""      m = (a>b)? a:b;\n""      return(m>c? m:c);\n""}\n""main( ) {\n""\tprintf(\"Hello, world\");\n""}\n""\t if(  2  ){\n\t   x = 5;\n\t   } int auto";  object tokens = tokens_from_chars( chars_from_string( source ) );  add_global_root( tokens );  PRINT( take( 4, tokens ) );  object program = tree_from_tokens( tokens );  PRINT( program );  PRINT( x_( x_(  ( drop( 1, program ), program ) ) ) );  PRINT_FLAT( x_( x_( program ) ) );  PRINT_DATA( x_( x_( program ) ) );  PRINT( xs_( x_( program ) ) );  PRINT( Int( garbage_collect( program ) ) );  return  0;}int main(){  return  tok_main(),          test_syntax(),          0;}

Makefile

The input to the test is for thepscanf() calls which looks for the literal"abc" then a%c then a%s. Andtest is the first rule, so a simplemake command will compile and then run the test rule/script.

CFLAGS= -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variableCFLAGS+= $(cflags)test : pc9        echo abc j string | ./$<clean :        rm *.opc9 : pc9obj.o pc9fp.o pc9par.o pc9tok.o pc9syn.o        $(CC) $(CFLAGS) -o $@ $^ $(LDLIBS)

All told, it's just under 1500 lines.

$ wc -l *[ch]  128 pc9fp.c   19 pc9fp.h  316 pc9obj.c   54 pc9obj.h   24 pc9objpriv.h  529 pc9par.c   48 pc9par.h  208 pc9syn.c   12 pc9syn.h   85 pc9tok.c   38 pc9tok.h   28 ppnarg.h 1489 total

Output from simple tests. At the very end, theprint_data function is used to recover all the strings hidden inside the token symbols in the syntax tree, reconstructing the source code.

$ make -kcc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable    -c -o pc9obj.o pc9obj.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable    -c -o pc9fp.o pc9fp.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable    -c -o pc9par.o pc9par.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable    -c -o pc9tok.o pc9tok.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable    -c -o pc9syn.o pc9syn.ccc -std=c99 -g -Wall -Wpedantic -Wno-switch -Wreturn-type -Wunused-variable  -o pc9 pc9obj.o pc9fp.o pc9par.o pc9tok.o pc9syn.o echo abc j string | ./pc9test_basics: ch = ... test_basics: Int( garbage_collect( ch ) ) = 0 test_basics: take( 1, ch ) = (97 _ )test_basics: Int( garbage_collect( ch ) ) = 4 test_basics: x_( ch ) = 97 test_basics: Int( garbage_collect( ch ) ) = 1 test_basics: x_( xs_( ch ) ) = ... test_basics: Int( garbage_collect( ch ) ) = 2 test_basics: take( 1, x_( xs_( ch ) ) ) = (_ _ )test_basics: Int( garbage_collect( ch ) ) = 5 test_basics: take( 5, ch ) = (97 98 99 100 101 _ )test_basics: Int( garbage_collect( ch ) ) = 12 test_basics: ch = (97 98 99 100 101 ... )test_basics: Int( garbage_collect( ch ) ) = 1 test_basics: take( 6, ch ) = (97 98 99 100 101 102 _ )test_basics: Int( garbage_collect( ch ) ) = 9 test_basics: take( 1, ch ) = (97 _ )test_basics: Int( garbage_collect( ch ) ) = 2 test_basics: take( 2, ch ) = (97 98 _ )test_basics: Int( garbage_collect( ch ) ) = 3 test_basics: take( 2, ch ) = (97 98 _ )test_basics: Int( garbage_collect( ch ) ) = 3 test_basics: take( 2, ch ) = (97 98 _ )test_basics: Int( garbage_collect( ch ) ) = 3 test_basics: take( 2, ch ) = (97 98 _ )test_basics: Int( garbage_collect( ch ) ) = 3 test_env: e = ((X 4 )(F 2 )() )test_env: assoc( Symbol_( F, "F" ), e ) = 2 test_env: assoc( Symbol_( X, "X" ), e ) = 4 test_parsers: parse( p, ch ) = ((42 ... )_ )test_parsers: Int( garbage_collect( ch ) ) = 33 test_parsers: parse( q, ch ) = _ test_parsers: Int( garbage_collect( ch ) ) = 2 test_parsers: r = Parser test_parsers: parse( r, ch ) = ((97 ... )_ )test_parsers: x_( parse( r, ch ) ) = (97 ... )test_parsers: take( 1, x_( parse( r, ch ) ) ) = (97 _ )test_parsers: x_( take( 1, x_( parse( r, ch ) ) ) ) = 97 test_parsers: take( 1, x_( take( 1, x_( parse( r, ch ) ) ) ) ) = (_ _ )test_parsers: parse( bind( r, Operator( 0, b ) ), ch ) = ((-97 ... )... )test_parsers: Int( garbage_collect( cons( ch, r ) ) ) = 46 test_parsers: s = Parser test_parsers: parse( s, ch ) = ((97 ... )... )test_parsers: take( 2, parse( s, ch ) ) = ((97 ... )(97 ... )_ )test_parsers: Int( garbage_collect( ch ) ) = 76 test_parsers: parse( t, ch ) = ((97 ... )... )test_parsers: parse( u, ch ) = (((97 32 98 32 99 () )... )... )test_parsers: Int( garbage_collect( cons( ch, cons( t, u ) ) ) ) = 372 test_regex: a = regex( "\\." ) = _ test_regex: parse( a, chars_from_string( "a" ) ) = _ test_regex: parse( a, chars_from_string( "." ) ) = _ test_regex: parse( a, chars_from_string( "\\." ) ) = _ test_regex: b = regex( "\\\\\\." ) = _ test_regex: parse( b, chars_from_string( "\\." ) ) = _ test_regex: take( 3, parse( b, chars_from_string( "\\." ) ) ) = _ test_regex: r = regex( "a?b+(c).|def" ) = Parser test_regex: parse( r, chars_from_string( "abc" ) ) = ... test_regex: parse( r, chars_from_string( "abbcc" ) ) = ... test_regex: Int( garbage_collect( r ) ) = 13660 test_regex: s = parse( r, chars_from_string( "def" ) ) = ... test_regex: take( 3, s ) = _ test_regex: parse( r, chars_from_string( "deff" ) ) = ... test_regex: parse( r, chars_from_string( "adef" ) ) = ... test_regex: parse( r, chars_from_string( "bcdef" ) ) = ... test_regex: Int( garbage_collect( cons( r, s ) ) ) = 130 test_regex: t = regex( "ac|bd" ) = _ test_regex: parse( t, chars_from_string( "ac" ) ) = _ test_regex: take( 1, parse( t, chars_from_string( "bd" ) ) ) = _ test_regex: Int( garbage_collect( t ) ) = 5294 test_regex: u = regex( "ab|cd|ef" ) = _ test_regex: parse( u, chars_from_string( "ab" ) ) = _ test_regex: parse( u, chars_from_string( "cd" ) ) = _ test_regex: take( 1, parse( u, chars_from_string( "cd" ) ) ) = _ test_regex: parse( u, chars_from_string( "ef" ) ) = _ test_regex: take( 1, parse( u, chars_from_string( "ef" ) ) ) = _ test_regex: Int( garbage_collect( u ) ) = 7804 test_regex: v = regex( "ab+(c).|def" ) = Parser test_regex: parse( v, chars_from_string( "def" ) ) = _ test_regex: take( 2, parse( v, chars_from_string( "def" ) ) ) = _ test_regex: w = regex( "a?b|c" ) = Parser test_regex: parse( w, chars_from_string( "a" ) ) = ... test_regex: parse( w, chars_from_string( "b" ) ) = ... test_regex: take( 3, parse( w, chars_from_string( "c" ) ) ) = ((() ... )_ )test_regex: Int( garbage_collect( w ) ) = 13306 test_pprintf: Int( pprintf( "%% abc %c %s\n", 'x', "123" ) ) = % abc x 12312 test_pscanf: Int( pscanf( "" ) ) = 0 test_pscanf: Int( pscanf( "abc" ) ) = 0 test_pscanf: Int( pscanf( "  %c", &c ) ) = 1 test_pscanf: string_from_chars( Int( c ) ) = "j"test_pscanf: Int( pscanf( "%s", buf ) ) = 1 test_pscanf: String( buf, 0 ) = "string"test_tokens: tokens = ... test_tokens: take( 1, tokens ) = (c_char _ )test_tokens: take( 2, tokens ) = (c_char k_auto _ )test_tokens: drop( 1, tokens ) = (k_auto ... )test_tokens: take( 2, drop( 1, tokens ) ) = (k_auto c_string _ )test_tokens: tokens = (c_char k_auto c_string c_int semi o_star o_plusplus ... )test_tokens: Int( garbage_collect( tokens ) ) = 28834 test_syntax: take( 4, tokens ) = (k_int t_id comma t_id _ )test_syntax: program = ... test_syntax: x_( x_( ( drop( 1, program ), program ) ) ) = (data_def data_def func_def func_def () )test_syntax: x_( x_( program ) ) flat= data_def data_def func_def func_def test_syntax: x_( x_( program ) ) data= int i,j,k 5;float d 3.4;int max(a, b, c)int a, b, c;{      int m;      m = (a>b)? a:b;      return(m>c? m:c);}main( ) {        printf("Hello, world");}test_syntax: xs_( x_( program ) ) = (k_if ... )test_syntax: Int( garbage_collect( program ) ) = 434910
askedMay 14, 2019 at 4:26
luser droog's user avatar
\$\endgroup\$
3
  • \$\begingroup\$An earlier, less lazy version of this code was posted incomp.lang.c and received helpful comments which have been applied in the code here.\$\endgroup\$CommentedMay 14, 2019 at 5:33
  • \$\begingroup\$Have you consideredCFLAGS += -Wextra ?\$\endgroup\$CommentedJun 19, 2019 at 7:36
  • \$\begingroup\$@TobySpeight I hadn't before, but trying it now it just tells me about unused parameters. They're all infOper functions which don't use the environment parameter. It's a good point that I should have checked this, but I think it doesn't tell me anything interesting.\$\endgroup\$CommentedJun 19, 2019 at 18:04

1 Answer1

4
\$\begingroup\$

Bug: singleton objects have no allocation record

Since the garbage collector will try to set themark() in aSYMBOL object, theT_ object needs a dummy allocation record.NIL_ doesn't need one since anINVALID object will not get marked.

pc9obj.c:

object T_ = &(1[(union uobject[]){{ .t = 0 },{ .Symbol = { SYMBOL, T, "T" } }}]),       NIL_ = (union uobject[]){{ .t = INVALID }};

Bug: using object fields for non-object data

In thepprintf() andpscanf() functions, theobject field inOPERATOR objects sometimes contains ava_list *! The garbage collector might fiddle with the memory around this address if it tries to set the (non-existant)mark(). The copious(void *) casts are a code smell. Better to use theVOID type object to hold this pointer.

Missing functions

There'ssome for 1 or more,many for 0 or more,maybe for 0 or 1.But there's no function to matchn times, orn or more, orn up to m times — these kind of quantifiers.

Poor namespacing for internal symbols

enum parser_symbols {  VALUE = SYM1, PRED, P, PP, NN, Q, R, FF, XX, AA, ID, USE, ATOM,  SYM2};

What areP, PP, NN, Q, R, FF, XX, AA?VALUEPRED andATOM are better but still kinda vague.

Short-circuit tests (and maybe actually test stuff)

int par_main(){  return       obj_main(),      test_env(), test_parsers(),      test_regex(),          test_pprintf(),          test_pscanf(),          0;}

Bonus formatting error. Better to short-circuit the tests based on the return values.

int par_main(){  return  0      ||  obj_main()      ||  test_env()      ||  test_parsers()      ||  test_regex()      ||  test_pprintf()      ||  test_pscanf()      ||  0;}

Then the testing functions can return non-zero to stop producing output.

No error reporting

A syntax error during parsing will result in an empty list being returned.Graham Hutton's paper describes how to rewrite the basic parser combinators so that meaningful error messages can be produced -- without using Monad Transformers which is the more typical way in functional languages.

answeredMay 15, 2019 at 3:34
luser droog's user avatar
\$\endgroup\$

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.