This repository was archived by the owner on May 20, 2019. It is now read-only.

railt/compilerPublic archive

forked fromhoaproject/Compiler

NotificationsYou must be signed in to change notification settings
Fork0
Star4

[DEPRECATED] Please use phplrt/compiler instead

github.com/phplrt/compiler

License

MIT license

4 stars 51 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 699 Commits
.github		.github
src/Compiler		src/Compiler
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
composer.json		composer.json
phpunit.xml		phpunit.xml

Repository files navigation

Compiler

This is the implementation of the so-called compiler-compiler based onthe basic capabilities ofHoa\Compiler.

The library is needed to create parsers from grammar files and is not usedduring the parsing itself, this is only required for development.

Before you begin to work with custom implementations of parsers, it isrecommended that you review theEBNF

Grammar

Each language consists of words that are added to sentences. And for the correctconstruction of the proposal, some rules are needed. Such rules are calledgrammar.

Let's try to create the corresponding grammar for the calculator, which can addtwo numbers. If you are familiar with alternative grammars (Antlr, BNF, EBNF, Hoa, etc.),then it will not be difficult for you.

(* "sum" is a rule that determines the sequence of a number, an addition symbol and one more number*)sum=digitplusdigit ;(* "digit" is one of the available numeric characters*)digit="0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" ;(* "plus" is a plus sign. Incredibly!*)plus="+" ;

The grammar of Railt is partly different from the original EBNF.In this way, let's restructure the same rule into the grammar of the Railt.

// The rule "digit" can be replaced by a simple lexeme,// which can be expressed in a PCRE "\d".%token T_DIGIT \d// The same applies to the "+" token.%token T_PLUS  \+// All whitespace chars must be ignored.%skip T_WHITESPACE \s+// Now we need to determine the "sum" rule, which will correspond// to the previous version.#Sum: <T_DIGIT> ::T_PLUS:: <T_DIGIT> ;

In order to test the performance simply use the reading andplaying grammar on the fly!

useRailt\Component\Io\File;useRailt\Component\Compiler\Compiler;$parser = Compiler::load(File::fromSources('/** * Grammar sources */%token T_DIGIT      \d%token T_PLUS       \+%skip  T_WHITESPACE \s+#Sum  : <T_DIGIT> ::T_PLUS:: <T_DIGIT>  ;'));echo$parser->parse(File::fromSources('2 + 2'));

On the output you will take anAST, which willbe serialized in XML by theecho operator and which will look like this:

<Ast>  <Sumoffset="0">    <T_DIGIToffset="0">2</T_DIGIT>    <T_DIGIToffset="2">2</T_DIGIT>  </Sum></Ast>

The naming register does not matter, but it is recommended that you name the tokens in upper case ("TOKEN_NAME"),and the rules with a capital letter ("RuleName"). Such recommendations will help you in the future easier tonavigate in the existing grammar.

Definitions

In the Railt grammar there are 5 types of definitions:

%token name regex - Definition of a name and value of a token.
%skip name regex - Definition of a name and value of a skipped token. Such tokens will be ignored and allowed anywhere in the grammar.
%pragma name value - Rules for the configuration of a lexer and a parser.
%include path/to/file - Link to another grammar file.
rule or#rule - The grammar rule.

Comments

In the Railt grammar, there are two types of C-like commentaries:

// Inline comment - This comment type begins with two slashes and ends with an end of the line.
/* Multiline comment */ - This comment type begins with/* symbols and ends with a*/ symbol.

Output Control

You probably already noticed that in grammar, the definitionsof tokens look a little different:<TOKEN> and::TOKEN::.

This way of determining the tokens inside the grammar tells the compilerwhether to print the ordered token as a result or not. It is for this reason that the token"plus" was ignored, because We do not need information about this token,but the values of "digit" tokens are important to us.

<TOKEN> - Keep token in AST.
::TOKEN:: - Hide token from AST.

Declaring rules

Each rule starts with the name of this rule. In addition, each rule can be marked with a# symbol that indicatesthat the ruleshould be kept in the AST.

#Rule - The defined rule must be present in the AST.
Rule - The defined rule should be hide from AST.

After the name there is a production (body) of this rule, which are separated byone of the valid characters:= or:. The separator characterdoes not matter and ispresent as compatibility with other grammars. In addition, the rule can end with anoptional; char.

The constructions of the PP2 language are the following:

rule() to call a rule,
<token> and::token:: to declare a lexeme.
| for a disjunction (an "alternation").
(…) for a group.
e? to say thate isoptional (0 or 1 times).
e+ to say thate can be present1 or more times.
e* to say thate can be present0 or more times.
e{x,y} (e{,y},e{x,} ore{x}) to say thate can be presentbetween x and y times.
#rule to create a rule node in the resulting tree.

Finally, the grammar of the PP2 languageiswritten with the PP2 language.

Let's try to add support for the remaining symbols of thecalculator: Moderation, Division and Subtraction; and at the same time slightlyimprove the rules of the lexer.

%skip  T_WHITESPACE     \s+%token T_DIGIT          \-?\d+%token T_PLUS           \+%token T_MINUS          \-%token T_DIV            /%token T_MUL            \*#Expression  : Operation()  ;Operation  : <T_DIGIT> (      Addition() |      Division() |      Subtraction() |      Multiplication()    )?  ;#Addition  : ::T_PLUS:: Operation()  ;#Division  : ::T_DIV:: Operation()  ;#Subtraction  : ::T_MINUS:: Operation()  ;#Multiplication  : ::T_MUL:: Operation()  ;

Simple expression4 + 8 - 15 * 16 / 23 + -42 will be parsed into the followed tree:

<Ast>  <Expressionoffset="0">    <T_DIGIToffset="0">4</T_DIGIT>    <Additionoffset="2">      <T_DIGIToffset="4">8</T_DIGIT>      <Subtractionoffset="6">        <T_DIGIToffset="8">15</T_DIGIT>        <Multiplicationoffset="11">          <T_DIGIToffset="13">16</T_DIGIT>          <Divisionoffset="16">            <T_DIGIToffset="18">23</T_DIGIT>            <Additionoffset="21">              <T_DIGIToffset="23">-42</T_DIGIT>            </Addition>          </Division>        </Multiplication>      </Subtraction>    </Addition>  </Expression></Ast>

Note that the grammar is quite trivial and does not contain the priorities of the operators.

Delegation

You can tell the compiler which php class to include the desired grammar rule usingkeyword-> after name of rule definition. In this case, each processed rule willcreate an instance of target class.

#Digit -> Path\To\Class  : <T_DIGIT>  ;

For more information about delegates, usethe Parser documentation.

Parser compilation

Reading a grammar is quite simple operation, but it still takes timeto execute. After the grammar rules have been formulated, you can "fix" the versionin a separate parser class that will contain all the logic and no longer requirereading the source code. After you compile it into a class, this package (railt/compiler)can be excluded from composer dependencies.

$compiler = Compiler::load(File::fromPathname('path/to/grammar.pp2'));$compiler->setNamespace('Example')    ->setClassName('Parser')    ->saveTo(__DIR__);

This code example will create a parser class in the current directorywith the required class and namespace names. An example of the result of generationcan be foundin an existing project here.As a source,this grammar file.

About

[DEPRECATED] Please use phplrt/compiler instead

github.com/phplrt/compiler

Releases14

1.3.3 Latest

Jan 18, 2019

+ 13 releases

Packages

No packages published

Languages

PHP100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Compiler

Grammar

Definitions

Comments

Output Control

Declaring rules

Delegation

Parser compilation

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases14

Packages

Uh oh!

Languages

Movatterモバイル変換

License

railt/compiler

Folders and files

Latest commit

History

Repository files navigation

Compiler

Grammar

Definitions

Comments

Output Control

Declaring rules

Delegation

Parser compilation

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases14

Packages0

Uh oh!

Languages

Packages