This repository was archived by the owner on Oct 6, 2024. It is now read-only.
- Notifications
You must be signed in to change notification settings - Fork1
Context-free grammar parsing library
License
NotificationsYou must be signed in to change notification settings
maandree/libparser
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repo has been moved to Codeberg and may be out of date on GitHub.Canonical repo:https://codeberg.org/maandree/libparserNAMElibparser - Right-context-sensitive grammar parsing libraryDESCRIPTIONlibparser is a small C library that parses input based on aprecompiled right-context-sensitive grammar.To use libparser, a developer should write a syntax for theinput that his application shall parse, in a syntax basedon Extended Backus–Naur form (EBNF) (somewhat simplified butalso somewhat extended). libparser-generate(1) is then usedto create a C source file describing the syntax, which shallbe compiled into an object file with a C compiler. This fileprovides a definition of a global variable declared in<libparser.h>: libparser_rule_table. This variable is usedwhen calling libparser_parse_file(3) to parse the application'sinput.libparser is proudly non-self-hosted.EXTENDED DESCRIPTION SyntaxThe grammar for libparser-generate(1)'s input can be describedin its own grammar:(* CHARACTER CLASSES *)_space = " " | "\n" | "\t";_alpha = <"a", "z"> | <"A", "Z">;_octal = <"0", "7">;_digit = <"0", "9">;_xdigit = _digit | <"a", "f"> | <"A", "F">;_nonascii = <128, 255>;(* WHITESPACE/COMMENTS, THE GRAMMAR IS FREE-FORM *)_comment_char = _space | !"*", !"\"", <"!", 0xFF>;_comment_tail = [_comment_char], [_string], ("*)" | _comment_tail | -);_comment = "(*", _comment_tail;_ = {_space | _comment};(* IDENTIFIERS *)_identifier_head = _alpha | _digit | _nonascii | "_";_identifier_tail = _identifier_head | "-";identifier = _identifier_head, {_identifier_tail};(* STRINGS *)_escape_simple = "\\" | "\"" | "'" | "a" | "b" | "f" | "n" | "r" | "v";_escape_hex = ("x" | "X"), _xdigit, _xdigit;_escape_octal = _octal, {_octal}; (* May not exceed 255 in base 10 *)_escape = _escape_simple | _escape_hex | _escape_octal | -;_character = "\\", _escape | !"\"", <" ", 0xFF>;_string = "\"", _character, {_character}, ("\"" | -);string = _stringcharacter = "\"", _character, ("\"" | -);(* INTEGERS *)_decimal = _digit, {_digit};_hexadecimal = "0", ("x" | "X"), _xdigit, {_xdigit};integer = _decimal | _hexadecimal; (* May not exceed 255. *)(* GROUPINGS *)_low = character | integer;_high = character | integer;rejection = "!", _, _operand;concatenation = _operand, {_, ",", _, _operand};alternation = concatenation, {_, "|", _, concatenation};optional = "[", _, _expression, _, "]";repeated = "{", _, _expression, _, "}";group = "(", _, _expression, _, ")";char-range = "<", _, _low, _, ",", _, _high, "_", ">";exception = "-";embedded-rule = identifier;_literal = char-range | exception | string;_group = optional | repeated | group | embedded-rule;_operand = _group | _literal | rejection;_expression = alternation;(* RULES *)rule = identifier, _, "=", _, _expression, _, ";";(* This is the root rule of the grammar. *)grammar = _, {rules, _};The file must be encoded in UTF-8, with LF as the linebreak (CR and FF are illegal just becuase).In alternations, the first (leftmost) match is selected. Theparser is able to backtrack incase it later turns out that itcould not finish that branch. Whenever an exception isreached, the parser will terminate there.Repeated symbols may occour any number of times, includingzero. The compiler is able to backtrack if it takes too much.Concatenation has higher precedence than alternation,groups ("(", ..., ")") have no semantic meaning and are usefulonly to put a alternation inside a concatenation withoutcreating a new rule for that.In character ranges, the _high and _low values must be atleast 0 and at most 255, and _high must be greater than _low.Rules that begin with an underscore will not show up forthe application in the parse result, the rest of the ruleswill appear in the tree-formatted result.Left recursion is illegal (it will cause stack overflow atruntime as the empty condition before the recursion is alwaysmet). Right-context-sensitive grammarlibparser originally used context-free grammar, but withintroduction of the rejection rule, specifically the abilityto reject a rejection, it became a prase forright-context-sensitive grammar which is a grammar that isthat can generate any context-sensitive language, it is howeverweakly equivalent to context-sensitive grammar.
About
Context-free grammar parsing library
Topics
Resources
License
Stars
Watchers
Forks
Packages0
No packages published