NotificationsYou must be signed in to change notification settings
Fork1
Star10

PEG-based parsing library written in C

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 351 Commits
bin		bin
example		example
src		src
test		test
.gitignore		.gitignore
Makefile		Makefile
NOTES.md		NOTES.md
README.html		README.html
README.md		README.md
setup.py		setup.py

Repository files navigation

libparsing

C & Python Parsing Elements Grammar Library

Version :  0.7.0URL     :  http://github.com/sebastien/parsingREADME  :  https://cdn.rawgit.com/sebastien/libparsing/master/README.html

libparsing is a parsing element grammar (PEG) library written in C withPython bindings. It offers decent performance while allowing for alot of flexibility. It is mainly intended to be used to create programminglanguages and software engineering tools.

As opposed to more traditional parsing techniques, the grammar is not compiledbut constructed using an API that allows dynamic update of the grammar.

The parser does not do any tokeninzation, the instead input stream isconsumed and parsing elements are dynamically asked to match the nextelement of it. Once parsing elements match, the resulting matched input isprocessed and an action is triggered.

libparsing supports the following features:

backtracking, ie. going back in the input stream if a match is not found
cherry-picking, ie. skipping unrecognized input
contextual rules, ie. a rule that will match or not depending on externalvariables

Parsing elements are usually slower than compiled or FSM-based parsers asthey trade performance for flexibility. It's probably not a great idea touselibparsing if the parsing has to happen as fast as possible (ie. a protocolimplementation), but it is a great use for programming languages, as itopens up the door to dynamic syntax plug-ins and multiple languageembedding.

If you're interested in PEG, you can start reading Brian Ford's originalarticle. Projects such as PEG/LEG by Ian Piumartahttp://piumarta.com/software/peg/,OMeta by Alessandro Warthhttp://www.tinlizzie.org/ometa/or Haskell's Parsec libraryhttps://www.haskell.org/haskellwiki/Parsecare of particular interest in the field.

Here is a short example of what creating a simple grammar looks likein Python:

g = Grammar()s = g.symbolsg.token("WS",       "\s+")g.token("NUMBER",   "\d+(\.\d+)?")g.token("VARIABLE", "\w+")g.token("OPERATOR", "[\/\+\-\*]")g.group("Value",     s.NUMBER, s.VARIABLE)g.rule("Suffix",     s.OPERATOR._as("operator"), s.Value._as("value"))g.rule("Expression", s.Value, s.Suffix.zeroOrMore())g.axiom(s.Expression)g.skip(s.WS)match = g.parseString("10 + 20 / 5")

and the equivalent code in C

Grammar* g = Grammar_new()SYMBOL(WS,         TOKEN("\\s+"))SYMBOL(NUMBER,     TOKEN("\\d+(\\.\\d+)?"))SYMBOL(VARIABLE,   TOKEN("\\w+"))SYMBOL(OPERATOR,   GROUP("[\\/\\+\\-\\*]"))SYMBOL(Value,      GOUP(_S(NUMBER), _S(VARIABLE)))SYMBOL(Suffix,     RULE(_AS(_S(OPERATOR), "operator"), _AS(_S(Value), "value")))SYMBOL(Expression, RULE(_S(Value), _MO(Suffix))g->axiom = s_Expression;g->skip(s_WS);Grammar_prepare(g);Match* match = Grammar_parseString(g, "10 + 20 / 5")

Installing

To install the Python parsing module:

easy_install libparsing# From Setuptoolspip install  libparsing# From PIP

Note that for the above to work, you'll need a C compilerlibffi-dev andlibpcre-dev.On Ubuntu, dosudo apt install build-essential libffi-dev libprcre-dev.

To compile the C parsing module:

git clone http://github.com/sebastien/libparsingcd libparsingmakemake install# You can set PREFIX

libparsing works with GCC4 and Clang and is written following thec11standard.

C API

Input data

The parsing library is configured at compile-time to iterate onspecific elements of input, typicallychar. You can redefinethe macroITERATION_UNIT to the type you'd like to iterate on.

By default, theITERATION_UNIT is achar, which works bothfor ASCII and UTF8. On the topic of Unicode/UTF8, the parsinglibrary only uses functions that are UTF8-savvy.

#ifndefITERATION_UNIT#defineITERATION_UNIT char#endiftypedefITERATION_UNITiterated_t;

Input data is acquired throughiterators. Iterators wrap an input source(the default input is aFileInput) and amove callback that updates theiterator's offset. The iterator will build a buffer of the acquired inputand maintain a pointer for the current offset within the data acquired fromthe input stream.

You can get an iterator on a file by doing:

Iterator*iterator=Iterator_Open("example.txt");

type`Iterator`

typedefstructIterator {charstatus;// The status of the iterator, one of STATUS_{INIT|PROCESSING|INPUT_ENDED|ENDED}char*buffer;// The buffer to the read data, note how it is a (void*) and not an `iterated_t`iterated_t*current;// The pointer current offset within the bufferiterated_tseparator;// The character for line separator, `\n` by default.size_toffset;// Offset in input (in bytes), might be different from `current - buffer` if some input was freed.size_tlines;// Counter for lines that have been encounteredsize_tcapacity;// Content capacity (in bytes), might be bigger than the data acquired from the inputsize_tavailable;// Available data in buffer (in bytes), always `<= capacity`boolfreeBuffer;void*input;// Pointer to the input sourcebool          (*move) (structIterator*,intn);// Plug-in function to move to the previous/next positions}Iterator;

type`FileInput`

The file input wraps information about the input file, suchas theFILE object and thepath.

typedefstructFileInput {FILE*file;constchar*path;}FileInput;

shared`EOL`

The EOL character used to count lines in an iterator context.

externiterated_tEOL;

operation`Iterator_Open`

Returns a new iterator instance with the given open file as input

Iterator*Iterator_Open(constchar*path);

operation`Iterator_FromString`

Returns a new iterator instance with the text

Iterator*Iterator_FromString(constchar*text);

constructor`Iterator`

Iterator*Iterator_new(void);

destructor`Iterator_free`

voidIterator_free(Iterator*this);

method`Iterator_open`

Makes the given iterator open the file at the given path.This will automatically assign aFileInput to the iteratoras an input source.

boolIterator_open(Iterator*this,constchar*path );

method`Iterator_hasMore`

Tells if the iterator has more available data. This means that there isavailable data after the current offset.

boolIterator_hasMore(Iterator*this );

method`Iterator_remaining`

Returns the number of bytes available from the current iterator's positionup to the last available data. For dynamic streams, where the length isunknown, this should be lesser or equalt toITERATOR_BUFFER_AHEAD.

size_tIterator_remaining(Iterator*this );

method`Iterator_moveTo`

Moves the iterator to the given offset

boolIterator_moveTo (Iterator*this,size_toffset );

method`String_move`

boolString_move (Iterator*this,intoffset );

define`ITERATOR_BUFFER_AHEAD`

The number ofiterated_t that should be loaded after the iterator'scurrent position. This limits the numbers ofiterated_t that aTokencould match.

#defineITERATOR_BUFFER_AHEAD 64000

constructor`FileInput`

FileInput*FileInput_new(constchar*path );

destructor`FileInput_free`

voidFileInput_free(FileInput*this);

method`FileInput_preload`

Preloads data from the input source so that the bufferhas up to ITERATOR_BUFFER_AHEAD characters ahead.

size_tFileInput_preload(Iterator*this );

method`FileInput_move`

Advances/rewinds the given iterator, loading new data from the file inputwhenever there is notITERATOR_BUFFER_AHEAD data elementsahead of the iterator's current position.

boolFileInput_move   (Iterator*this,intn );

Grammar

TheGrammar is the concrete definition of the language you're going toparse. It is defined by anaxiom and input data that can be skipped,such as white space.

Theaxiom andskip properties are both references toparsing elements.

typedefstructParsingContextParsingContext;typedefstructParsingElementParsingElement;typedefstructParsingResultParsingResult;typedefstructReferenceReference;typedefstructMatchMatch;typedefvoidElement;

typedef struct Element {char type; // Type is used du differentiate ParsingElement from Referenceint id; // The ID, assigned by the grammar, as the relative distance to the axiom} Element;

type`Grammar`

typedefstructGrammar {ParsingElement*axiom;// The axiomParsingElement*skip;// The skipped elementintaxiomCount;// The count of parsing elemetns in axiomintskipCount;// The count of parsing elements in skipElement**elements;// The set of all elements in the grammarboolisVerbose;}Grammar;

constructor`Grammar`

Grammar*Grammar_new(void);

destructor`Grammar_free`

voidGrammar_free(Grammar*this);

method`Grammar_prepare`

voidGrammar_prepare (Grammar*this );

method`Grammar_symbolsCount`

intGrammar_symbolsCount (Grammar*this );

method`Grammar_parseIterator`

ParsingResult*Grammar_parseIterator(Grammar*this,Iterator*iterator );

method`Grammar_parsePath`

ParsingResult*Grammar_parsePath(Grammar*this,constchar*path );

method`Grammar_parseString`

ParsingResult*Grammar_parseString(Grammar*this,constchar*text );

method`Grammar_freeElements`

voidGrammar_freeElements(Grammar*this);

Elements

callback`WalkingCallback`

typedefint (*WalkingCallback)(Element*this,intstep,void*context);

method`Element_walk`

intElement_walk(Element*this,WalkingCallbackcallback,void*context);

method`Element__walk`

intElement__walk(Element*this,WalkingCallbackcallback,intstep,void*context);

Parsing Elements

Parsing elements are the core elements that recognize and process inputdata. There are 4 basic types:Work,Token,Group andRule.

Parsing elements offer two main operations:recognize andprocess.Therecognize method generates aMatch object (that might be theFAILUREsingleton if the data was not recognized). Theprocess method tranformscorresponds to a user-defined action that transforms theMatch objectand returns the generated value.

Parsing element are assigned anid that corresponds to their breadth-first distanceto the axiom. Before parsing, the grammar will re-assign the parsing element'sid accordingly.

type`Match`

typedefstructMatch {charstatus;// The status of the match (see STATUS_XXX)size_toffset;// The offset of `iterated_t` matchedsize_tlength;// The number of `iterated_t` matchedElement*element;ParsingContext*context;void*data;// The matched data (usually a subset of the input stream)structMatch*next;// A pointer to the next  match (see `References`)structMatch*children;// A pointer to the child match (see `References`)void*result;// A pointer to the result of the match}Match;

define`STATUS_INIT`

The different values for a match (or iterator)'s status

#defineSTATUS_INIT        '-'

define`STATUS_PROCESSING`

#defineSTATUS_PROCESSING  '~'

define`STATUS_MATCHED`

#defineSTATUS_MATCHED     'M'

define`STATUS_SUCCESS`

#defineSTATUS_SUCCESS     'S'

define`STATUS_PARTIAL`

#defineSTATUS_PARTIAL     's'

define`STATUS_FAILED`

#defineSTATUS_FAILED      'X'

define`STATUS_INPUT_ENDED`

#defineSTATUS_INPUT_ENDED '.'

define`STATUS_ENDED`

#defineSTATUS_ENDED       'E'

define`TYPE_ELEMENT`

#defineTYPE_ELEMENT    'E'

define`TYPE_WORD`

#defineTYPE_WORD       'W'

define`TYPE_TOKEN`

#defineTYPE_TOKEN      'T'

define`TYPE_GROUP`

#defineTYPE_GROUP      'G'

define`TYPE_RULE`

#defineTYPE_RULE       'R'

define`TYPE_CONDITION`

#defineTYPE_CONDITION  'c'

define`TYPE_PROCEDURE`

#defineTYPE_PROCEDURE  'p'

define`TYPE_REFERENCE`

#defineTYPE_REFERENCE  '#'

define`ID_UNBOUND`

A parsing element that is not bound to a grammar will have ID_UNBOUNDby default.

#defineID_UNBOUND      -10

define`ID_BINDING`

A parsing element that being bound to a grammar (seeGrammar_prepare)will have an id ofID_BINDING temporarily.

#defineID_BINDING       -1

singleton`FAILURE_S`

A specific match that indicates a failure

externMatchFAILURE_S;

shared`FAILURE`

externMatch*FAILURE;

operation`Match_Empty`

Creates new empty (successful) match

Match*Match_Empty(Element*element,ParsingContext*context);

operation`Match_Success`

Creates a new successful match of the given length

Match*Match_Success(size_tlength,Element*element,ParsingContext*context);

constructor`Match`

Match*Match_new(void);

destructor`Match_free`

Frees the given match. If the match isFAILURE, then it won'tbe feed. This means that most of the times you won't need to freea failed match, as it's likely to be theFAILURE singleton.

voidMatch_free(Match*this);

method`Match_isSuccess`

boolMatch_isSuccess(Match*this);

method`Match_getOffset`

intMatch_getOffset(Match*this);

method`Match_getLength`

intMatch_getLength(Match*this);

method`Match__walk`

intMatch__walk(Match*this,WalkingCallbackcallback,intstep,void*context );

method`Match_countAll`

intMatch_countAll(Match*this);

type`ParsingElement`

typedefstructParsingElement {chartype;// Type is used du differentiate ParsingElement from Referenceintid;// The ID, assigned by the grammar, as the relative distance to the axiomconstchar*name;// The parsing element's name, for debuggingvoid*config;// The configuration of the parsing elementstructReference*children;// The parsing element's children, if anystructMatch*         (*recognize) (structParsingElement*,ParsingContext*);structMatch*         (*process)   (structParsingElement*,ParsingContext*,Match*);void                  (*freeMatch) (Match*);}ParsingElement;

operation`ParsingElement_Is`

Tells if the given pointer is a pointer to a ParsingElement.

boolParsingElement_Is(void*);

constructor`ParsingElement`

Creates a new parsing element and adds the given referencedparsing elements as children. Note that this is an internalconstructor, and you should use the specialized versions instead.

ParsingElement*ParsingElement_new(Reference*children[]);

destructor`ParsingElement_free`

voidParsingElement_free(ParsingElement*this);

method`ParsingElement_add`

Adds a new reference as child of this parsing element. This will onlybe effective for composite parsing elements such asRule orToken.

ParsingElement*ParsingElement_add(ParsingElement*this,Reference*child);

method`ParsingElement_clear`

ParsingElement*ParsingElement_clear(ParsingElement*this);

method`ParsingElement_clear`

Returns the match for this parsing element for the given iterator's state.inline Match* ParsingElement_recognize( ParsingElement* this, ParsingContext* context );

method`ParsingElement_process`

Processes the given match once the parsing element has fully succeeded. Thisis where user-bound actions will be applied, and where you're most likelyto do things such as construct an AST.

Match*ParsingElement_process(ParsingElement*this,Match*match );

method`ParsingElement_name`

Transparently sets the name of the element

ParsingElement*ParsingElement_name(ParsingElement*this,constchar*name );

Word

Words recognize a static string at the current iterator location.

type`WordConfig`

The parsing element configuration information that is used by theToken methods.

typedefstructWordConfig {constchar*word;size_tlength;}WordConfig;

constructor`ParsingElement`

ParsingElement*Word_new(constchar*word);

destructor`Word_free`

voidWord_free(ParsingElement*this);

method`Word_recognize`

The specialized match function for token parsing elements.

Match*Word_recognize(ParsingElement*this,ParsingContext*context);

method`Word_word`

constchar*Word_word(ParsingElement*this);

method`WordMatch_group`

constchar*WordMatch_group(Match*match);

Tokens

Tokens are regular expression based parsing elements. They do not haveany children and test if the regular expression matches exactly at theiterator's current location.

type`TokenConfig`

The parsing element configuration information that is used by theToken methods.

typedefstructTokenConfig {constchar*expr;#ifdefWITH_PCREpcre*regexp;pcre_extra*extra;#endif}TokenConfig;

type`TokenMatch`

typedefstructTokenMatch {intcount;constchar**groups;}TokenMatch;

method`Token_new`

Creates a new token with the given POSIX extended regular expression

ParsingElement*Token_new(constchar*expr);

destructor`Token_free`

voidToken_free(ParsingElement*);

method`Token_recognize`

The specialized match function for token parsing elements.

Match*Token_recognize(ParsingElement*this,ParsingContext*context);

method`Token_expr`

constchar*Token_expr(ParsingElement*this);

method`TokenMatch_free`

Frees theTokenMatch created inToken_recognize

voidTokenMatch_free(Match*match);

method`TokenMatch_group`

constchar*TokenMatch_group(Match*match,intindex);

method`TokenMatch_count`

intTokenMatch_count(Match*match);

References

We've seen that parsing elements can havechildren. However, a parsingelement's children are not directly parsing elements but ratherparsing elements'References. This is why theParsingElement_add takesaReference object as parameter.

References allow to share a single parsing element between many differentcomposite parsing elements, while decorating them with additional informationsuch as their cardinality (ONE,OPTIONAL,MANY andMANY_OPTIONAL)and aname that will allowprocess actions to easily access specificparts of the parsing element.

type`Reference`

typedefstructReference {chartype;// Set to Reference_T, to disambiguate with ParsingElementintid;// The ID, assigned by the grammar, as the relative distance to the axiomcharcardinality;// Either ONE (default), OPTIONAL, MANY or MANY_OPTIONALconstchar*name;// The name of the reference (optional)structParsingElement*element;// The reference to the parsing elementstructReference*next;// The next child reference in the parsing elements}Reference;

define`CARDINALITY_OPTIONAL`

The different values for theReference cardinality.

#defineCARDINALITY_OPTIONAL      '?'

define`CARDINALITY_ONE`

#defineCARDINALITY_ONE           '1'

define`CARDINALITY_MANY_OPTIONAL`

#defineCARDINALITY_MANY_OPTIONAL '*'

define`CARDINALITY_MANY`

#defineCARDINALITY_MANY          '+'

operation`Reference_Is`

Tells if the given pointer is a pointer to Reference

boolReference_Is(void*this);

operation`Reference_Ensure`

Ensures that the given element (or reference) is a reference.

Reference*Reference_Ensure(void*elementOrReference);

operation`Reference_FromElement`

Returns a new reference wrapping the given parsing element

Reference*Reference_FromElement(ParsingElement*element);

constructor`Reference`

References are typically owned by their single parent composite element.

Reference*Reference_new(void);

destructor`Reference_free`

voidReference_free(Reference*this);

method`Reference_cardinality`

Sets the cardinality of this reference, returning it transprently.

Reference*Reference_cardinality(Reference*this,charcardinality);

method`Reference_name`

Reference*Reference_name(Reference*this,constchar*name);

method`Reference_hasNext`

boolReference_hasNext(Reference*this);

method`Reference_hasElement`

boolReference_hasElement(Reference*this);

method`Reference__walk`

intReference__walk(Reference*this,WalkingCallbackcallback,intstep,void*nothing );

method`Reference_recognize`

Returns the matched value corresponding to the first match of this reference.OPTIONAL references might returnEMPTY,ONE references will returna match with anext=NULL whileMANY may return a match with anextpointing to the next match.

Match*Reference_recognize(Reference*this,ParsingContext*context);

Groups

Groups are composite parsing elements that will return the first matching reference'smatch. Think of it as a logicalor.

constructor`ParsingElement`

ParsingElement*Group_new(Reference*children[]);

method`Group_recognize`

Match*Group_recognize(ParsingElement*this,ParsingContext*context);

Rules

Groups are composite parsing elements that only succeed if all theirmatching reference's.

constructor`ParsingElement`

ParsingElement*Rule_new(Reference*children[]);

method`Rule_recognize`

Match*Rule_recognize(ParsingElement*this,ParsingContext*context);

Procedures

Procedures are parsing elements that do not consume any input, alwayssucceed and usually have a side effect, such as setting a variablein the parsing context.

callback`ProcedureCallback`

typedefvoid (*ProcedureCallback)(ParsingElement*this,ParsingContext*context);

callback`MatchCallback`

typedefvoid (*MatchCallback)(Match*m);

constructor`ParsingElement`

ParsingElement*Procedure_new(ProcedureCallbackc);

method`Procedure_recognize`

Match*Procedure_recognize(ParsingElement*this,ParsingContext*context);

Conditions

Conditions, like procedures, execute arbitrary code when executed, butthey might return a FAILURE.

callback`ConditionCallback`

typedefMatch* (*ConditionCallback)(ParsingElement*,ParsingContext*);

constructor`ParsingElement`

ParsingElement*Condition_new(ConditionCallbackc);

method`Condition_recognize`

Match*Condition_recognize(ParsingElement*this,ParsingContext*context);

The parsing process

The parsing itself is the process of taking agrammar and applying itto an input stream of data, represented by theiterator.

The grammar'saxiom will be matched against theiterator's currentposition, and if necessary, the grammar'sskip parsing elementwill be applied to advance the iterator.

typedefstructParsingStepParsingStep;typedefstructParsingOffsetParsingOffset;

type`ParsingStats`

typedefstructParsingStats {size_tbytesRead;doubleparseTime;size_tsymbolsCount;size_t*successBySymbol;size_t*failureBySymbol;size_tfailureOffset;// A reference to the deepest failuresize_tmatchOffset;size_tmatchLength;Element*failureElement;// A reference to the failure element}ParsingStats;

constructor`ParsingStats`

ParsingStats*ParsingStats_new(void);

destructor`ParsingStats_free`

voidParsingStats_free(ParsingStats*this);

method`ParsingStats_setSymbolsCount`

voidParsingStats_setSymbolsCount(ParsingStats*this,size_tt);

method`ParsingStats_registerMatch`

Match*ParsingStats_registerMatch(ParsingStats*this,Element*e,Match*m);

type`ParsingContext`

typedefstructParsingContext {structGrammar*grammar;// The grammar used to parsestructIterator*iterator;// Iterator on the input datastructParsingOffset*offsets;// The parsing offsets, starting at 0structParsingOffset*current;// The current parsing offsetstructParsingStats*stats;}ParsingContext;

constructor`ParsingContext`

ParsingContext*ParsingContext_new(Grammar*g,Iterator*iterator );

method`ParsingContext_text`

iterated_t*ParsingContext_text(ParsingContext*this );

destructor`ParsingContext_free`

voidParsingContext_free(ParsingContext*this );

type`ParsingResult`

typedefstructParsingResult {charstatus;Match*match;ParsingContext*context;}ParsingResult;

constructor`ParsingResult`

ParsingResult*ParsingResult_new(Match*match,ParsingContext*context);

method`ParsingResult_free`

Frees this parsing result instance as well as all the matches it referes to.

voidParsingResult_free(ParsingResult*this);

method`ParsingResult_isFailure`

boolParsingResult_isFailure(ParsingResult*this);

method`ParsingResult_isPartial`

boolParsingResult_isPartial(ParsingResult*this);

method`ParsingResult_isComplete`

boolParsingResult_isComplete(ParsingResult*this);

method`ParsingResult_text`

iterated_t*ParsingResult_text(ParsingResult*this);

method`ParsingResult_textOffset`

intParsingResult_textOffset(ParsingResult*this);

method`ParsingResult_remaining`

size_tParsingResult_remaining(ParsingResult*this);

The result ofrecognizing parsing elements at given offsets within theinput stream is stored inParsingOffset. Each parsing offset is a stackofParsingStep, corresponding to successive attempts at matchingparsing elements at the current position.

The parsing offset is a stack of parsing steps, where the tail is the mostspecific parsing step. By following the tail's previous parsing step,you can unwind the stack.

The parsing steps each have an offset within the iterated stream. Offsetswhere data has been fully extracted (ie, a leaf parsing element has matchedand processing returned a NOTHING) can be freed as they are not necessaryany more.

type`ParsingOffset`

typedefstructParsingOffset {size_toffset;// The offset matched in the input streamParsingStep*last;// The last matched parsing step (ie. corresponding to the most specialized parsing element)structParsingOffset*next;// The link to the next offset (if any)}ParsingOffset;

constructor`ParsingOffset`

ParsingOffset*ParsingOffset_new(size_toffset );

destructor`ParsingOffset_free`

voidParsingOffset_free(ParsingOffset*this );

The parsing step allows to memoize the state of a parsing element at a givenoffset. This is the data structure that will be manipulated and created/destroyedthe most during the parsing process.

typedefstructParsingStep {ParsingElement*element;// The parsing elemnt we're matchingcharstep;// The step corresponds to current child's index (0 for token/word)unsignedintiteration;// The current iteration (on the step)charstatus;// Match status `STATUS_{INIT|PROCESSING|FAILED}`Match*match;// The corresponding match, if any.structParsingStep*previous;// The previous parsing step on the parsing offset's stack}ParsingStep;

constructor`ParsingStep`

ParsingStep*ParsingStep_new(ParsingElement*element );

destructor`ParsingStep_free`

voidParsingStep_free(ParsingStep*this );

Processor

typedefstructProcessorProcessor;

callback`ProcessorCallback`

typedefvoid (*ProcessorCallback)(Processor*processor,Match*match);typedefstructProcessor {ProcessorCallbackfallback;ProcessorCallback*callbacks;intcallbacksCount;}Processor;

constructor`Processor`

Processor*Processor_new( );

method`Processor_free`

voidProcessor_free(Processor*this);

method`Processor_register`

voidProcessor_register (Processor*this,intsymbolID,ProcessorCallbackcallback) ;

method`Processor_process`

intProcessor_process (Processor*this,Match*match,intstep);

Utilities

method`Utilities_indent`

voidUtilities_indent(ParsingElement*this,ParsingContext*context );

method`Utilities_dedent`

voidUtilities_dedent(ParsingElement*this,ParsingContext*context );

method`Utilites_checkIndent`

Match*Utilites_checkIndent(ParsingElement*this,ParsingContext*context );

Syntax Sugar

The parsing library provides a set of macros that make defining grammarsa much easier task. A grammar is usually defined in the following way:

leaf symbols (words & tokens) are defined ;
compound symbolds (rules & groups) are defined.

Let's take as simple grammar and define it with the straight API:

// Leaf symbolsParsingElement* s_NUMBER   = Token_new("\\d+");ParsingElement* s_VARIABLE = Token_new("\\w+");ParsingElement* s_OPERATOR = Token_new("[\\+\\-\\*\\/]");// We also attach names to the symbols so that debugging will be easierParsingElement_name(s_NUMBER,   "NUMBER");ParsingElement_name(s_VARIABLE, "VARIABLE");ParsingElement_name(s_OPERATOR, "OPERATOR");// Now we defined the compound symbolsParsingElement* s_Value    = Group_new((Reference*[3]),{Reference_cardinality(Reference_Ensure(s_NUMBER),   CARDINALITY_ONE),Reference_cardinality(Reference_Ensure(s_VARIABLE), CARDINALITY_ONE)NULL});ParsingElement* s_Suffix    = Rule_new((Reference*[3]),{Reference_cardinality(Reference_Ensure(s_OPERATOR),  CARDINALITY_ONE),Reference_cardinality(Reference_Ensure(s_Value),     CARDINALITY_ONE)NULL});* ParsingElement* s_Expr    = Rule_new((Reference*[3]),{Reference_cardinality(Reference_Ensure(s_Value),  CARDINALITY_ONE),Reference_cardinality(Reference_Ensure(s_Suffix), CARDINALITY_MANY_OPTIONAL)NULL});// We define the names as wellParsingElement_name(s_Value,  "Value");ParsingElement_name(s_Suffix, "Suffix");ParsingElement_name(s_Expr, "Expr");

As you can see, this is quite verbose and makes reading the grammar declarationa difficult task. Let's introduce a set of macros that will make expressinggrammars much easier.

Symbol declaration & creation

macro`SYMBOL`

Declares a symbol of namen as being parsing elemente.

#defineSYMBOL(n,e)       ParsingElement* s_ ## n = ParsingElement_name(e, #n);

macro`WORD`

Creates aWord parsing element with the given regular expression

#defineWORD(v)           Word_new(v)

macro`TOKEN`

Creates aToken parsing element with the given regular expression

#defineTOKEN(v)          Token_new(v)

macro`RULE`

Creates aRule parsing element with the references or parsing elementsas children.

#defineRULE(...)         Rule_new((Reference*[(VA_ARGS_COUNT(__VA_ARGS__)+1)]){__VA_ARGS__,NULL})

macro`GROUP`

Creates aGroup parsing element with the references or parsing elementsas children.

#defineGROUP(...)        Group_new((Reference*[(VA_ARGS_COUNT(__VA_ARGS__)+1)]){__VA_ARGS__,NULL})

macro`PROCEDURE`

Creates aProcedure parsing element

#definePROCEDURE(f)      Procedure_new(f)

macro`CONDITION`

Creates aCondition parsing element

#defineCONDITION(f)      Condition_new(f)

Symbol reference & cardinality

macro`_S`

Refers to symboln, wrapping it in aCARDINALITY_ONE reference

#define_S(n)             ONE(s_ ## n)

macro`_O`

Refers to symboln, wrapping it in aCARDINALITY_OPTIONAL reference

#define_O(n)             OPTIONAL(s_ ## n)

macro`_M`

Refers to symboln, wrapping it in aCARDINALITY_MANY reference

#define_M(n)             MANY(s_ ## n)

macro`_MO`

Refers to symboln, wrapping it in aCARDINALITY_MANY_OPTIONAL reference

#define_MO(n)            MANY_OPTIONAL(s_ ## n)

macro`_AS`

Sets the name of referencer to be v

#define_AS(r,v)          Reference_name(Reference_Ensure(r), v)

Supporting macros

The following set of macros is mostly used by the set of macros above.You probably won't need to use them directly.

macro`NAME`

Sets the name of the given parsing elemente to be the namen.

#defineNAME(n,e)         ParsingElement_name(e,n)

macro`ONE`

Sets the given reference or parsing element's reference to CARDINALITY_ONEIf a parsing element is given, it will be automatically wrapped in a reference.

#defineONE(v)            Reference_cardinality(Reference_Ensure(v), CARDINALITY_ONE)

macro`OPTIONAL`

Sets the given reference or parsing element's reference to CARDINALITY_OPTIONALIf a parsing element is given, it will be automatically wrapped in a reference.

#defineOPTIONAL(v)       Reference_cardinality(Reference_Ensure(v), CARDINALITY_OPTIONAL)

macro`MANY`

Sets the given reference or parsing element's reference to CARDINALITY_MANYIf a parsing element is given, it will be automatically wrapped in a reference.

#defineMANY(v)           Reference_cardinality(Reference_Ensure(v), CARDINALITY_MANY)

macro`MANY_OPTIONAL`

Sets the given reference or parsing element's reference to CARDINALITY_MANY_OPTIONALIf a parsing element is given, it will be automatically wrapped in a reference.

#defineMANY_OPTIONAL(v)  Reference_cardinality(Reference_Ensure(v), CARDINALITY_MANY_OPTIONAL)

Grammar declaration with macros

The same grammar that we defined previously can now be expressed in thefollowing way:

SYMBOL(NUMBER,   TOKEN("\\d+"))SYMBOL(VAR,      TOKEN("\\w+"))SYMBOL(OPERATOR, TOKEN("[\\+\\-\\*\\/]"))SYMBOL(Value,  GROUP( _S(NUMBER),   _S(VAR)     ))SYMBOL(Suffix, RULE(  _S(OPERATOR), _S(Value)   ))SYMBOL(Expr,   RULE(  _S(Value),    _MO(Suffix) ))

All symbols will be define ass_XXX, so that you can do:

ParsingGrammar* g = Grammar_new();g->axiom = s_Expr;

License

Redistribution and use in source and binary forms, with or withoutmodification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, thislist of conditions and the following disclaimer. Redistributions in binaryform must reproduce the above copyright notice, this list of conditions andthe following disclaimer in the documentation and/or other materialsprovided with the distribution. Neither the name of the FFunction inc(CANADA) nor the names of its contributors may be used to endorse or promoteproducts derived from this software without specific prior writtenpermission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THEIMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSEARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BELIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, ORCONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OFSUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESSINTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER INCONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THEPOSSIBILITY OF SUCH DAMAGE.

About

PEG-based parsing library written in C

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

sebastien/libparsing

Folders and files

Latest commit

History

Repository files navigation

libparsing

C & Python Parsing Elements Grammar Library

Installing

C API

Input data

Grammar

Elements

Parsing Elements

Word

Tokens

References

Groups

Rules

Procedures

Conditions

The parsing process