
Here you can find the text of aworking LL(1) grammar that is able to parse theSinclair BASIC used in the original ZX Spectrum 16K/48K. ALL(1) grammar is not the best in terms of structural efficiency of the generatedASTs (LALR grammars are better), but it excels in terms of simplicity of parser execution.
This is another item in thesoftware related to the ZX Spectrum that I have developed over the years.
The grammar covers the language for the mentioned models except the part concerning external devices (Microdrive, etc.), since they can provide varying syntax for the corresponding statements, nor the additional statements available in more modern models (e.g., AY sound in 128K). Neither should be difficult to add, anyway.
This grammarhas been tested successfully with many programs of the 80s but also with programs taken from theBASIC Jam 2017 contest, some of the Spanish"bytemaniacos" contests, and several books preserved atproyectoBasicZX. In some of the most modern programs, it has served to detect deviations from the Sinclair BASIC, such as the use of special commands used by BASIC compilers.
It has also served as the core for theZX-Basicus utility to synthetize and analyze ZX Spectrum BASIC programs.
The format of the file is very simple: plain ASCII text with capitalized terminals and lower-case non-terminals. It does not follow the syntax ofBison or any other parser generator, but it should be straightforward to adapt.
// ZX SPECTRUM BASIC - ORIGINAL 48 SINCLAIR BASIC (FOR ZX 16K/48K ZX SPECTRUM)// LL(1) Grammar v2.1.2 by (c) Juan-Antonio Fernandez-Madrigal, 2021// https://jafma.net// NOTES://-Capitalized keywords: terminals ; non-capitalized keywords: non-terminals//-Like in the original BASIC, this grammar accepts empty statements (e.g.,//'10 ::'. Unlike in the original BASIC after tokenized, this grammar accepts//empty lines (only the line number).//-External devices are needed for the ZX ROM to recognize these commands,//but if the devices are connected, they can provide their own syntax,//thus they are not included in the grammar: CAT, FORMAT, MOVE, ERASE.//OPEN# and CLOSE# are allowed, but only for dealing with//the ROM standard channels, i.e., with syntax "OPEN# number,string" and//"CLOSE# number".//-Unlike the original ZX BASIC, LIST and INKEY$ are not allowed to use '#'//(stream redirection).//-Other models of the ZX Spectrum have more commands, that are not included//in this grammar.// -We use semantic versioning for this grammar (https://semver.org/).Terminals:{// cat 0: miscellaneous terminals// (same order as in Lex.h :: BaseTerminal::KindID){LINENUMBERENDOFLINEENDOFSTATBEGINPARENDPARCOMMASEMICOLONSTRINGLITNUMLITVARNAME}// The next categories must go in this very order:// cat 1: statements (same order as in Lex.h :: CommandID){DEFFNMERGEVERIFYBEEPCIRCLEOUTLPRINTLLISTSTOPREADRESTORENEWBORDERCONTINUEDIMREMFORGOTOGOSUBINPUTLOADLISTLETPAUSENEXTPOKEPRINTPLOTRUNSAVERANDOMIZEIFCLSDRAWCLEARRETURNCOPYOPENCLOSE}// cat 2: substatements (SubcommandID){LINETHENSTEPATTABSTREAMAPOSTROPHE}// cat 3: functions (FunctionID){RNDINKEYPIFNPOINTATTRVAL_STRVALLENSINCOSTANASNACSATNLNEXPINTSQRSGNABSPEEKINUSRSTRCHRNOTBIN}// cat 4: operators (OperatorID){MULTDIVLESSGREATEXPONORANDLESSEQGREATEQNOTEQ}// cat 5: multiple (MultipleID){EQUALPLUSMINUSTOSCREENDATACODEINKPAPERFLASHBRIGHTINVERSEOVER}}Non-terminals:{// cat 0: program{programlistoflineslinestatementsstatementwithnormsepstatementwiththenseprestofstatementsnormrestofstatementsthenstatement0statement01statement1statement2statementconsstatementgraphstatementcassstatementdeclvstatementflownextstatementflowifstatementdefs}// cat 1: statements{definitionfnfilespecextrafilespeccoderestoffilespeccodeoptionalstep}// cat 2: I/O{plotseqprintseqinputseqmoreinputseqoptionaldrawoptionalinputitemprintitemprintseprestofprintseqprintcontrolattrcontrolloccontrolstreamcontrolinputitemrestofinputseqinputseqoremptyplotitemsplotitemrestofplotitems}// cat 3: variables{funcparmsrestofparmslistofreadvarsrestoflistofreadvarstopvarvardimsdimsrestofdimsvaroptionalseqofindexesfirstparindexestailofindexesoptionalmoreseqofindexesmoreparindexesindexindexesindexorsliceindexorslice1restofindexofslice1indexorslice2restofrestofindexofslice}// cat 4: expressions{toplevelexprtopleveloptexprtoplevelexprnonvarexpressionrestofexprlistofexprsrestoflistofexprsfnoptionallistofexprsfnlistofexprsfnrestoflistofexprsexpressionnonvarhighprecexprhighprecexprnonvarusrfunctioncallsysfunctioncallop}}Rules:{// ---------------- HIGH LEVEL: program, lines and statementsprogram:listoflines;listoflines:lineENDOFLINElistoflines|/* empty */;line:LINENUMBERstatements;statements:statementwithnormseprestofstatementsnorm|statementwiththenseprestofstatementsthen|restofstatementsnorm/* to allow for ": :" */;statementwithnormsep:statement0|statement01|statement1|statement2|statementcons|statementgraph|statementcass|statementflownext|statementdeclv|statementdefs;statementwiththensep:statementflowif;restofstatementsnorm:ENDOFSTATstatements|/* empty */;restofstatementsthen:THENstatements;/* In BASIC, THEN can be followed by stat or by ':' or by nothing. */statement0:COPY|STOP|NEW|CONTINUE|CLS|RETURN|REM/* This terminal embeds the comment inside */;statement01:LLISTtopleveloptexpr|LISTtopleveloptexpr|RUNtopleveloptexpr|RANDOMIZEtopleveloptexpr|RESTOREtopleveloptexpr|CLEARtopleveloptexpr;statement1:MERGEtoplevelexpr|INKtoplevelexpr|PAPERtoplevelexpr|FLASHtoplevelexpr|BRIGHTtoplevelexpr|INVERSEtoplevelexpr|OVERtoplevelexpr|BORDERtoplevelexpr|GOTOtoplevelexpr|GOSUBtoplevelexpr|PAUSEtoplevelexpr|CLOSEtoplevelexpr;statement2:BEEPtoplevelexprCOMMAtoplevelexpr|OUTtoplevelexprCOMMAtoplevelexpr|POKEtoplevelexprCOMMAtoplevelexpr|OPENtoplevelexprCOMMAtoplevelexpr;statementcons:LPRINTprintseq|INPUTinputseq|PRINTprintseq;statementgraph:PLOTplotseqtoplevelexprCOMMAtoplevelexpr|CIRCLEplotseqtoplevelexprCOMMAtoplevelexprCOMMAtoplevelexpr|DRAWplotseqtoplevelexprCOMMAtoplevelexproptionaldraw;statementcass:VERIFYtoplevelexprfilespecextra|LOADtoplevelexprfilespecextra|SAVEtoplevelexprfilespecextra;statementdeclv:DIMVARNAMEvardims|LETtopvarEQUALtoplevelexpr|READlistofreadvars|FORVARNAMEEQUALtoplevelexprTOtoplevelexproptionalstep;/* Notice that "statementcons" also can include declarations of vars, e.g.,in INPUT */statementflownext:NEXTVARNAME;statementflowif:IFtoplevelexpr;statementdefs:DEFFNdefinitionfn|DATAlistofexprs;// ---------------- STATEMENT: DEF FNdefinitionfn:VARNAMEBEGINPARfuncparmsENDPAREQUALtoplevelexpr;/* funcparms may repeat the same VARNAME and it will be considered different(no recursive calls will be made in execution) */funcparms:VARNAMErestofparms|/* empty */;restofparms:COMMAfuncparms|/* empty */;// ---------------- STATEMENT: FILE MANAGEMENT STATEMENTSfilespecextra:DATAVARNAMEBEGINPARENDPAR|CODEfilespeccode|SCREEN|LINEtoplevelexpr|/* empty */;filespeccode:toplevelexprrestoffilespeccode|/* empty */;restoffilespeccode:COMMAtoplevelexpr|/* empty */;// ---------------- STATEMENT: PRINTprintseq:printitemrestofprintseq|printsepprintseq|/* empty */;printitem:toplevelexpr|printcontrol;restofprintseq:printsepprintseq|/* empty */;printsep:COMMA|APOSTROPHE|SEMICOLON;printcontrol:attrcontrol|loccontrol|streamcontrol;attrcontrol:PAPERtoplevelexpr|INKtoplevelexpr|BRIGHTtoplevelexpr|FLASHtoplevelexpr|INVERSEtoplevelexpr|OVERtoplevelexpr;loccontrol:ATtoplevelexprCOMMAtoplevelexpr|TABtoplevelexpr;streamcontrol:STREAMtoplevelexpr;// ---------------- STATEMENT: INPUTinputseq:inputitemrestofinputseq|printsepmoreinputseq;moreinputseq:inputseq|/* empty */;inputitem:toplevelexprnonvar/* expression that cannot begin with a var */|topvar/* indexed or not */|LINEtopvar/* indexed or not */|printcontrol;optionalinputitem:inputitem|/* empty */;inputseqorempty:optionalinputitemrestofinputseq;restofinputseq:printsepinputseqorempty|/* empty */;// ---------------- STATEMENT: PLOT & DRAWplotseq:plotitems|/* empty */;plotitems:plotitemSEMICOLONrestofplotitems;restofplotitems:plotitems|/* empty */;plotitem:attrcontrol;optionaldraw:COMMAtoplevelexpr|/* empty */;// ---------------- STATEMENT: READlistofreadvars:topvarrestoflistofreadvars;restoflistofreadvars:COMMAlistofreadvars|/* empty */;// ---------------- STATEMENT: DIMvardims:BEGINPARdimsENDPAR;/* "dims" cannot be empty */dims:toplevelexprrestofdims;restofdims:COMMAdims|/* empty */;// ---------------- STATEMENT: FORoptionalstep:STEPtoplevelexpr|/* empty */;// ---------------- VARS IN EXPRESSIONStopvar:var;/* To distinguish between vars embedded in an expression and isolated vars as in LET, etc. */var:VARNAMEoptionalseqofindexes;/* General use of a variable in an expr. *//* A string var may be indexed several times, e.g., 'a$( TO 8)( TO 3)( TO 1)'Any var can have several indexes in the first parenthesis (if it is a matrix)separated by comma, but only one in each of the following.If the variable is an array, it cannot have "TO" slices except in the last indexof the first parenthesis, as long as it is string; that must be checked outbesides parsing.This grammar cannot check whether VARNAME is not a string but a TO is used(error); that must also be done besides parsing. */optionalseqofindexes:firstparindexesoptionalmoreseqofindexes|/* empty */;optionalmoreseqofindexes:moreparindexesoptionalmoreseqofindexes|/* empty*/;firstparindexes:BEGINPARindexesENDPAR;moreparindexes:BEGINPARindexENDPAR;indexes:indexorslicetailofindexes|/* empty */;index:indexorslice|/* empty */;tailofindexes:COMMAindexorslicetailofindexes|/* empty */;indexorslice:indexorslice1|indexorslice2;indexorslice1:expressionrestofindexofslice1;indexorslice2:TOrestofrestofindexofslice;/* Slicing beginning with TO */restofindexofslice1:TOrestofrestofindexofslice|/* empty */;restofrestofindexofslice:expression|/* empty */;// ---------------- MID LEVEL: EXPRESSIONStopleveloptexpr:toplevelexpr|/* empty */;toplevelexpr:expression;/* Cannot be nested, unlike expression */toplevelexprnonvar:expressionnonvar;/* Cannot be nested, unlike expressionnonvar */listofexprs:toplevelexprrestoflistofexprs;restoflistofexprs:COMMAlistofexprs|/* empty */;expression:highprecexprrestofexpr;/* We cannot distinguish between numeric and string expressions in this grammarbecause the DATA statement needs a list of expressions with mixed types(numeric / string), and that would make the grammar non-LL(1). Therefore, thesemantic parsing is in charge of detecting forbidden expressions (operationsnot allowed on numeric or on strings, for instance). */expressionnonvar:highprecexprnonvarrestofexpr;/* cannot begin with a var */highprecexpr:highprecexprnonvar|var;highprecexprnonvar:NUMLIT/* Compiled no. embedded in terminal */|STRINGLIToptionalmoreseqofindexes/* slice/index a str literal */|BEGINPARexpressionENDPARoptionalmoreseqofindexes/* idem */|MINUShighprecexpr/* unary minus */|PLUShighprecexpr/* unary plus */|FNusrfunctioncall|sysfunctioncall;restofexpr:opexpression|/* empty */;op:EXPON|MULT|DIV|PLUS|MINUS/* binary minus and plus */|EQUAL|GREAT|LESS|GREATEQ|LESSEQ|NOTEQ|AND|OR;/* In the ZX, the sys functions have higher priority than the rest of theexpression where they are */sysfunctioncall:RND|PI|CODEhighprecexpr|VALhighprecexpr|LENhighprecexpr|SINhighprecexpr|COShighprecexpr|TANhighprecexpr|ASNhighprecexpr|ACShighprecexpr|ATNhighprecexpr|LNhighprecexpr|EXPhighprecexpr|INThighprecexpr|SQRhighprecexpr|SGNhighprecexpr|ABShighprecexpr|PEEKhighprecexpr|INhighprecexpr|NOThighprecexpr|BINNUMLIT|USRhighprecexpr|POINTBEGINPARexpressionCOMMAexpressionENDPAR|ATTRBEGINPARexpressionCOMMAexpressionENDPAR|INKEY|SCREENBEGINPARexpressionCOMMAexpressionENDPAR|VAL_STRhighprecexpr|STRhighprecexpr|CHRhighprecexpr;usrfunctioncall:VARNAMEBEGINPARfnoptionallistofexprsENDPAR;fnoptionallistofexprs:fnlistofexprs|/* empty */;fnlistofexprs:expressionfnrestoflistofexprs;fnrestoflistofexprs:COMMAfnlistofexprs|/* empty */;}// end of rules
In the following there is an example of the resulting AST from a very simple program (graphical tree drawn byyEd). Here you can appreciate the complexity of the tree in spite of the simplicity of the code. A depth-first traversal of the leaves of the tree will produce the original code.

Inthis link you can download the latest version of the grammar.
This grammar has been written and tested byJuan-Antonio Fernández-Madrigal in Autumn 2018 / Winter 2019 / Spring 2020.
If you are interested in contacting me, you can use "software" (remove quotes) at jafma.net.