Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on May 20, 2019. It is now read-only.
/compilerPublic archive
forked fromhoaproject/Compiler

[DEPRECATED] Please use phplrt/compiler instead

License

NotificationsYou must be signed in to change notification settings

railt/compiler

 
 

Repository files navigation

Railt

Travis CI

PHP 7.1+railt.orgDiscordLatest Stable VersionTotal DownloadsLicense MIT

Compiler

This is the implementation of the so-called compiler-compiler based onthe basic capabilities ofHoa\Compiler.

The library is needed to create parsers from grammar files and is not usedduring the parsing itself, this is only required for development.

Before you begin to work with custom implementations of parsers, it isrecommended that you review theEBNF

Grammar

Each language consists of words that are added to sentences. And for the correctconstruction of the proposal, some rules are needed. Such rules are calledgrammar.

Let's try to create the corresponding grammar for the calculator, which can addtwo numbers. If you are familiar with alternative grammars (Antlr, BNF, EBNF, Hoa, etc.),then it will not be difficult for you.

(* "sum" is a rule that determines the sequence of a number, an addition symbol and one more number*)sum=digitplusdigit ;(* "digit" is one of the available numeric characters*)digit="0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" ;(* "plus" is a plus sign. Incredibly!*)plus="+" ;

The grammar of Railt is partly different from the original EBNF.In this way, let's restructure the same rule into the grammar of the Railt.

// The rule "digit" can be replaced by a simple lexeme,// which can be expressed in a PCRE "\d".%token T_DIGIT \d// The same applies to the "+" token.%token T_PLUS  \+// All whitespace chars must be ignored.%skip T_WHITESPACE \s+// Now we need to determine the "sum" rule, which will correspond// to the previous version.#Sum: <T_DIGIT> ::T_PLUS:: <T_DIGIT> ;

In order to test the performance simply use the reading andplaying grammar on the fly!

useRailt\Component\Io\File;useRailt\Component\Compiler\Compiler;$parser = Compiler::load(File::fromSources('/** * Grammar sources */%token T_DIGIT      \d%token T_PLUS       \+%skip  T_WHITESPACE \s+#Sum  : <T_DIGIT> ::T_PLUS:: <T_DIGIT>  ;'));echo$parser->parse(File::fromSources('2 + 2'));

On the output you will take anAST, which willbe serialized in XML by theecho operator and which will look like this:

<Ast>  <Sumoffset="0">    <T_DIGIToffset="0">2</T_DIGIT>    <T_DIGIToffset="2">2</T_DIGIT>  </Sum></Ast>

The naming register does not matter, but it is recommended that you name the tokens in upper case ("TOKEN_NAME"),and the rules with a capital letter ("RuleName"). Such recommendations will help you in the future easier tonavigate in the existing grammar.

Definitions

In the Railt grammar there are 5 types of definitions:

  • %token name regex - Definition of a name and value of a token.
  • %skip name regex - Definition of a name and value of a skipped token. Such tokens will be ignored and allowed anywhere in the grammar.
  • %pragma name value - Rules for the configuration of a lexer and a parser.
  • %include path/to/file - Link to another grammar file.
  • rule or#rule - The grammar rule.

Comments

In the Railt grammar, there are two types of C-like commentaries:

  1. // Inline comment - This comment type begins with two slashes and ends with an end of the line.
  2. /* Multiline comment */ - This comment type begins with/* symbols and ends with a*/ symbol.

Output Control

You probably already noticed that in grammar, the definitionsof tokens look a little different:<TOKEN> and::TOKEN::.

This way of determining the tokens inside the grammar tells the compilerwhether to print the ordered token as a result or not. It is for this reason that the token"plus" was ignored, because We do not need information about this token,but the values of "digit" tokens are important to us.

  1. <TOKEN> - Keep token in AST.
  2. ::TOKEN:: - Hide token from AST.

Declaring rules

Each rule starts with the name of this rule. In addition, each rule can be marked with a# symbol that indicatesthat the ruleshould be kept in the AST.

  1. #Rule - The defined rule must be present in the AST.
  2. Rule - The defined rule should be hide from AST.

After the name there is a production (body) of this rule, which are separated byone of the valid characters:= or:. The separator characterdoes not matter and ispresent as compatibility with other grammars. In addition, the rule can end with anoptional; char.

The constructions of the PP2 language are the following:

  • rule() to call a rule,
  • <token> and::token:: to declare a lexeme.
  • | for a disjunction (an "alternation").
  • (…) for a group.
  • e? to say thate isoptional (0 or 1 times).
  • e+ to say thate can be present1 or more times.
  • e* to say thate can be present0 or more times.
  • e{x,y} (e{,y},e{x,} ore{x}) to say thate can be presentbetween x and y times.
  • #rule to create a rule node in the resulting tree.

Finally, the grammar of the PP2 languageiswritten with the PP2 language.

Let's try to add support for the remaining symbols of thecalculator: Moderation, Division and Subtraction; and at the same time slightlyimprove the rules of the lexer.

%skip  T_WHITESPACE     \s+%token T_DIGIT          \-?\d+%token T_PLUS           \+%token T_MINUS          \-%token T_DIV            /%token T_MUL            \*#Expression  : Operation()  ;Operation  : <T_DIGIT> (      Addition() |      Division() |      Subtraction() |      Multiplication()    )?  ;#Addition  : ::T_PLUS:: Operation()  ;#Division  : ::T_DIV:: Operation()  ;#Subtraction  : ::T_MINUS:: Operation()  ;#Multiplication  : ::T_MUL:: Operation()  ;

Simple expression4 + 8 - 15 * 16 / 23 + -42 will be parsed into the followed tree:

<Ast>  <Expressionoffset="0">    <T_DIGIToffset="0">4</T_DIGIT>    <Additionoffset="2">      <T_DIGIToffset="4">8</T_DIGIT>      <Subtractionoffset="6">        <T_DIGIToffset="8">15</T_DIGIT>        <Multiplicationoffset="11">          <T_DIGIToffset="13">16</T_DIGIT>          <Divisionoffset="16">            <T_DIGIToffset="18">23</T_DIGIT>            <Additionoffset="21">              <T_DIGIToffset="23">-42</T_DIGIT>            </Addition>          </Division>        </Multiplication>      </Subtraction>    </Addition>  </Expression></Ast>

Note that the grammar is quite trivial and does not contain the priorities of the operators.

Delegation

You can tell the compiler which php class to include the desired grammar rule usingkeyword-> after name of rule definition. In this case, each processed rule willcreate an instance of target class.

#Digit -> Path\To\Class  : <T_DIGIT>  ;

For more information about delegates, usethe Parser documentation.

Parser compilation

Reading a grammar is quite simple operation, but it still takes timeto execute. After the grammar rules have been formulated, you can "fix" the versionin a separate parser class that will contain all the logic and no longer requirereading the source code. After you compile it into a class, this package (railt/compiler)can be excluded from composer dependencies.

$compiler = Compiler::load(File::fromPathname('path/to/grammar.pp2'));$compiler->setNamespace('Example')    ->setClassName('Parser')    ->saveTo(__DIR__);

This code example will create a parser class in the current directorywith the required class and namespace names. An example of the result of generationcan be foundin an existing project here.As a source,this grammar file.

Packages

No packages published

Languages

  • PHP100.0%

[8]ページ先頭

©2009-2025 Movatter.jp