Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Streaming based VHDL parser.

License

NotificationsYou must be signed in to change notification settings

Paebbels/pyVHDLParser

Repository files navigation

Sourcecode on GitHubLicenseGitHub tag (latest SemVer incl. pre-release)GitHub release (latest SemVer incl. including pre-releases)GitHub release date
GitHub Workflow StatusPyPIPyPI - StatusPyPI - Python VersionDependent repos (via libraries.io)
Libraries.io status for latest releaseCodacy - QualityCodacy - CoverageCodecov - Branch CoverageLibraries.io SourceRank
Read the Docs

pyVHDLParser

This is a token-stream based parser for VHDL-2008.

This project requires Python 3.8+.

Introduction

Main Goals

  • Parsing
    • slice an input document intotokens and textblocks which are categorized
    • preserve case, whitespace and comments
    • recover on parsing errors
    • good error reporting / throw exceptions
  • Fast Processing
    • multi-pass parsing and analysis
    • delay analysis if not needed at current pass
    • link tokens and blocks for fast-forward scanning
  • Generic VHDL Language Model
    • Assemble a document-object-model (Code-DOM)
    • Provide an API for code introspection

Use Cases

  • generate documentation by using the fast-forward scanner
  • generate a document/language model by using the grouped text-block scanner
  • extract compile orders and other dependency graphs
  • generate highlighted syntax
  • re-annotate documenting comments to their objects for doc extraction

Parsing approach

  1. slice an input document intotokens
  2. assemble tokens to textblocks which are categorized
  3. assemble text blocks for fast-forward scanning intogroups
  4. translate groups into a document-object-model (DOM)
  5. provide a generic VHDL language model

Long time goals

  • A Sphinx language plugin for VHDL

    TODO: Move the following documentation to ReadTheDocs and replace it with a more lightweight version.

Basic Concept

Example 1

This is an input file:

-- Copryright 2016library IEEE;use     IEEE.std_logic_1164.all;entitymyEntityisgeneric (    BITS :positive:=8  );port (    Clock   :instd_logic;Output  :outstd_logic_vector(BITS-1downto0)  );endentity;architecturertlofmyEntityisconstant const0 :integer:=5;beginprocess(Clock)beginendprocess;endarchitecture;library IEEE, PoC;use     PoC.Utils.all, PoC.Common.all;packagepkg0isfunctionfunc0(a :integer)returnstring;endpackage;packagebodyComponentsisfunctionfunc0(a :integer)returnstringisprocedure proc0isbeginendprocedure;beginendfunctionendpackagebody;

Step 1

The input file (stream of characters) is translated into stream of basic tokens:

  • StartOfDocumentToken
  • LinebreakToken
  • SpaceToken
    • IndentationToken
  • WordToken
  • CharacterToken
    • FusedCharacterToken
  • CommentToken
    • SingleLineCommentToken
    • MultiLineCommentToken
  • EndOfDocumentToken

The stream looks like this:

<StartOfDocumentToken><SLCommentToken '-- Copryright 2016\n'  ................ at 1:1><WordToken      'library'  ............................. at 2:1><SpaceToken     ' '  ................................... at 2:8><WordToken      'IEEE'  ................................ at 2:9><CharacterToken ';'  ................................... at 2:13><LinebreakToken ---------------------------------------- at 2:14><WordToken      'use'  ................................. at 3:1><SpaceToken     '     '  ............................... at 3:4><WordToken      'IEEE'  ................................ at 3:9><CharacterToken '.'  ................................... at 3:13><WordToken      'std_logic_1164'  ...................... at 3:14><CharacterToken '.'  ................................... at 3:28><WordToken      'all'  ................................. at 3:29><CharacterToken ';'  ................................... at 3:32><LinebreakToken ---------------------------------------- at 3:33><LinebreakToken ---------------------------------------- at 4:1><WordToken      'entity'  .............................. at 5:1><SpaceToken     ' '  ................................... at 5:7><WordToken      'myEntity'  ............................ at 5:8><SpaceToken     ' '  ................................... at 5:16><WordToken      'is'  .................................. at 5:17><LinebreakToken ---------------------------------------- at 5:19><IndentToken    '\t'  .................................. at 6:1><WordToken      'generic'  ............................. at 6:2><SpaceToken     ' '  ................................... at 6:9><CharacterToken '('  ................................... at 6:10><LinebreakToken ---------------------------------------- at 6:11><IndentToken    '\t\t'  ................................ at 7:1><WordToken      'BITS'  ................................ at 7:3><SpaceToken     ' '  ................................... at 7:7><CharacterToken ':'  ................................... at 7:8><SpaceToken     ' '  ................................... at 7:8><WordToken      'positive'  ............................ at 7:10><SpaceToken     ' '  ................................... at 7:18><FusedCharToken ':='  .................................. at 7:19><SpaceToken     ' '  ................................... at 7:21><WordToken      '8'  ................................... at 7:22><LinebreakToken ---------------------------------------- at 7:23><IndentToken    '\t'  .................................. at 8:1><CharacterToken ')'  ................................... at 8:2><CharacterToken ';'  ................................... at 8:3><LinebreakToken ---------------------------------------- at 8:4><IndentToken    '\t'  .................................. at 9:1><WordToken      'port'  ................................ at 9:2><SpaceToken     ' '  ................................... at 9:6><CharacterToken '('  ................................... at 9:7><LinebreakToken ---------------------------------------- at 9:8><IndentToken    '\t\t'  ................................ at 10:1><WordToken      'Clock'  ............................... at 10:3><SpaceToken     '   '  ................................. at 10:8><CharacterToken ':'  ................................... at 10:11><SpaceToken     ' '  ................................... at 10:11><WordToken      'in'  .................................. at 10:13><SpaceToken     '  '  .................................. at 10:15><WordToken      'std_logic'  ........................... at 10:17><CharacterToken ';'  ................................... at 10:26><LinebreakToken ---------------------------------------- at 10:27><IndentToken    '\t\t'  ................................ at 11:1><WordToken      'Output'  .............................. at 11:3><SpaceToken     '       '  ................................... at 11:9><CharacterToken ':'  ................................... at 11:10><SpaceToken     ' '  ................................... at 11:10><WordToken      'out'  ................................. at 11:12><SpaceToken     '       '  ................................... at 11:15><WordToken      'std_logic_vector'  .................... at 11:16><CharacterToken '('  ................................... at 11:32><WordToken      'BITS'  ................................ at 11:33><SpaceToken     ' '  ................................... at 11:37><CharacterToken '-'  ................................... at 11:38><SpaceToken     ' '  ................................... at 11:38><WordToken      '1'  ................................... at 11:40><SpaceToken     ' '  ................................... at 11:41><WordToken      'downto'  .............................. at 11:42><SpaceToken     ' '  ................................... at 11:48><WordToken      '0'  ................................... at 11:49><CharacterToken ')'  ................................... at 11:50><LinebreakToken ---------------------------------------- at 11:51><IndentToken    '\t'  .................................. at 12:1><CharacterToken ')'  ................................... at 12:2><CharacterToken ';'  ................................... at 12:3><LinebreakToken ---------------------------------------- at 12:4><WordToken      'end'  ................................. at 13:1><SpaceToken     ' '  ................................... at 13:4><WordToken      'entity'  .............................. at 13:5><CharacterToken ';'  ................................... at 13:11><LinebreakToken ---------------------------------------- at 13:12><LinebreakToken ---------------------------------------- at 14:1><WordToken      'architecture'  ........................ at 15:1><SpaceToken     ' '  ................................... at 15:13><WordToken      'rtl'  ................................. at 15:14><SpaceToken     ' '  ................................... at 15:17><WordToken      'of'  .................................. at 15:18><SpaceToken     ' '  ................................... at 15:20><WordToken      'myEntity'  ............................ at 15:21><SpaceToken     ' '  ................................... at 15:29><WordToken      'is'  .................................. at 15:30><LinebreakToken ---------------------------------------- at 15:32><IndentToken    '\t'  .................................. at 16:1><WordToken      'constant'  ............................ at 16:2><SpaceToken     ' '  ................................... at 16:10><WordToken      'const0'  .............................. at 16:11><SpaceToken     ' '  ................................... at 16:17><CharacterToken ':'  ................................... at 16:18><SpaceToken     ' '  ................................... at 16:18><WordToken      'integer'  ............................. at 16:20><SpaceToken     ' '  ................................... at 16:27><FusedCharToken ':='  .................................. at 16:28><SpaceToken     ' '  ................................... at 16:30><WordToken      '5'  ................................... at 16:31><CharacterToken ';'  ................................... at 16:32><LinebreakToken ---------------------------------------- at 16:33><WordToken      'begin'  ............................... at 17:1><LinebreakToken ---------------------------------------- at 17:6><IndentToken    '\t'  .................................. at 18:1><WordToken      'process'  ............................. at 18:2><CharacterToken '('  ................................... at 18:9><WordToken      'Clock'  ............................... at 18:10><CharacterToken ')'  ................................... at 18:15><LinebreakToken ---------------------------------------- at 18:16><IndentToken    '\t'  .................................. at 19:1><WordToken      'begin'  ............................... at 19:2><LinebreakToken ---------------------------------------- at 19:7><IndentToken    '\t'  .................................. at 20:1><WordToken      'end'  ................................. at 20:2><SpaceToken     ' '  ................................... at 20:5><WordToken      'process'  ............................. at 20:6><CharacterToken ';'  ................................... at 20:13><LinebreakToken ---------------------------------------- at 20:14><WordToken      'end'  ................................. at 21:1><SpaceToken     ' '  ................................... at 21:4><WordToken      'architecture'  ........................ at 21:5><CharacterToken ';'  ................................... at 21:17><LinebreakToken ---------------------------------------- at 21:18><LinebreakToken ---------------------------------------- at 22:1><WordToken      'library'  ............................. at 23:1><SpaceToken     ' '  ................................... at 23:8><WordToken      'IEEE'  ................................ at 23:9><CharacterToken ','  ................................... at 23:13><SpaceToken     ' '  ................................... at 23:14><WordToken      'PoC'  ................................. at 23:15><CharacterToken ';'  ................................... at 23:18><LinebreakToken ---------------------------------------- at 23:19><WordToken      'use'  ................................. at 24:1><SpaceToken     '     '  ............................... at 24:4><WordToken      'PoC'  ................................. at 24:9><CharacterToken '.'  ................................... at 24:12><WordToken      'Utils'  ............................... at 24:13><CharacterToken '.'  ................................... at 24:18><WordToken      'all'  ................................. at 24:19><CharacterToken ','  ................................... at 24:22><SpaceToken     ' '  ................................... at 24:23><WordToken      'PoC'  ................................. at 24:24><CharacterToken '.'  ................................... at 24:27><WordToken      'Common'  .............................. at 24:28><CharacterToken '.'  ................................... at 24:34><WordToken      'all'  ................................. at 24:35><CharacterToken ';'  ................................... at 24:38><LinebreakToken ---------------------------------------- at 24:39><LinebreakToken ---------------------------------------- at 25:1><WordToken      'package'  ............................. at 26:1><SpaceToken     ' '  ................................... at 26:8><WordToken      'pkg0'  ................................ at 26:9><SpaceToken     ' '  ................................... at 26:13><WordToken      'is'  .................................. at 26:14><LinebreakToken ---------------------------------------- at 26:16><IndentToken    '\t'  .................................. at 27:1><WordToken      'function'  ............................ at 27:2><SpaceToken     ' '  ................................... at 27:10><WordToken      'func0'  ............................... at 27:11><CharacterToken '('  ................................... at 27:16><WordToken      'a'  ................................... at 27:17><SpaceToken     ' '  ................................... at 27:18><CharacterToken ':'  ................................... at 27:19><SpaceToken     ' '  ................................... at 27:19><WordToken      'integer'  ............................. at 27:21><CharacterToken ')'  ................................... at 27:28><SpaceToken     ' '  ................................... at 27:29><WordToken      'return'  .............................. at 27:30><SpaceToken     ' '  ................................... at 27:36><WordToken      'string'  .............................. at 27:37><CharacterToken ';'  ................................... at 27:43><LinebreakToken ---------------------------------------- at 27:44><WordToken      'end'  ................................. at 28:1><SpaceToken     ' '  ................................... at 28:4><WordToken      'package'  ............................. at 28:5><CharacterToken ';'  ................................... at 28:12><LinebreakToken ---------------------------------------- at 28:13><LinebreakToken ---------------------------------------- at 29:1><WordToken      'package'  ............................. at 30:1><SpaceToken     ' '  ................................... at 30:8><WordToken      'body'  ................................ at 30:9><SpaceToken     ' '  ................................... at 30:13><WordToken      'Components'  .......................... at 30:14><SpaceToken     ' '  ................................... at 30:24><WordToken      'is'  .................................. at 30:25><LinebreakToken ---------------------------------------- at 30:27><IndentToken    '\t'  .................................. at 31:1><WordToken      'function'  ............................ at 31:2><SpaceToken     ' '  ................................... at 31:10><WordToken      'func0'  ............................... at 31:11><CharacterToken '('  ................................... at 31:16><WordToken      'a'  ................................... at 31:17><SpaceToken     ' '  ................................... at 31:18><CharacterToken ':'  ................................... at 31:19><SpaceToken     ' '  ................................... at 31:19><WordToken      'integer'  ............................. at 31:21><CharacterToken ')'  ................................... at 31:28><SpaceToken     ' '  ................................... at 31:29><WordToken      'return'  .............................. at 31:30><SpaceToken     ' '  ................................... at 31:36><WordToken      'string'  .............................. at 31:37><SpaceToken     ' '  ................................... at 31:43><WordToken      'is'  .................................. at 31:44><LinebreakToken ---------------------------------------- at 31:46><IndentToken    '\t\t'  ................................ at 32:1><WordToken      'procedure'  ........................... at 32:3><SpaceToken     ' '  ................................... at 32:12><WordToken      'proc0'  ............................... at 32:13><SpaceToken     ' '  ................................... at 32:18><WordToken      'is'  .................................. at 32:19><LinebreakToken ---------------------------------------- at 32:21><IndentToken    '\t\t'  ................................ at 33:1><WordToken      'begin'  ............................... at 33:3><LinebreakToken ---------------------------------------- at 33:8><IndentToken    '\t\t'  ................................ at 34:1><WordToken      'end'  ................................. at 34:3><SpaceToken     ' '  ................................... at 34:6><WordToken      'procedure'  ........................... at 34:7><CharacterToken ';'  ................................... at 34:16><LinebreakToken ---------------------------------------- at 34:17><IndentToken    '\t'  .................................. at 35:1><WordToken      'begin'  ............................... at 35:2><LinebreakToken ---------------------------------------- at 35:7><IndentToken    '\t'  .................................. at 36:1><WordToken      'end'  ................................. at 36:2><SpaceToken     ' '  ................................... at 36:5><WordToken      'function'  ............................ at 36:6><LinebreakToken ---------------------------------------- at 36:14><WordToken      'end'  ................................. at 37:1><SpaceToken     ' '  ................................... at 37:4><WordToken      'package'  ............................. at 37:5><SpaceToken     ' '  ................................... at 37:12><WordToken      'body'  ................................ at 37:13><CharacterToken ';'  ................................... at 37:17><LinebreakToken ---------------------------------------- at 37:18>

Screenshot

Step 2

The token stream from step 1 is translated into typed tokens likeDelimiterToken (:),EndToken (;) or subtypes ofKeywordToken.These tokens are then grouped into blocks.

The example generates:

[StartOfDocumentBlock][Blocks.CommentBlock            '-- Copryright 2016\n'                                         at (line:   1, col:  1) .. (line:   1, col: 19)][LibraryStatement.LibraryBlock           'library '                                                     at (line:   2, col:  1) .. (line:   2, col:  8)][LibraryStatement.LibraryNameBlock       'IEEE'                                                         at (line:   2, col:  9) .. (line:   2, col: 13)][LibraryStatement.LibraryEndBlock        ';'                                                            at (line:   2, col: 13) .. (line:   2, col: 13)][LinebreakBlock                                                                                at (line:   2, col: 14) .. (line:   2, col: 14)][Use.UseBlock                   'use     '                                                     at (line:   3, col:  1) .. (line:   3, col:  8)][Use.UseNameBlock               'IEEE.std_logic_1164.all'                                      at (line:   3, col:  9) .. (line:   3, col: 32)][Use.UseEndBlock                ';'                                                            at (line:   3, col: 32) .. (line:   3, col: 32)][LinebreakBlock                                                                                at (line:   3, col: 33) .. (line:   3, col: 33)][EmptyLineBlock                                                                                at (line:   4, col:  1) .. (line:   4, col:  1)][Entity.NameBlock               'entity myEntity is'                                           at (line:   5, col:  1) .. (line:   5, col: 19)][LinebreakBlock                                                                                at (line:   5, col: 19) .. (line:   5, col: 19)][IndentationBlock                length=1 (2)                                                  at (line:   6, col:  1) .. (line:   6, col:  1)][GenericList.OpenBlock          'generic ('                                                    at (line:   6, col:  2) .. (line:   6, col: 10)][LinebreakBlock                                                                                at (line:   6, col: 11) .. (line:   6, col: 11)][IndentationBlock                length=2 (4)                                                  at (line:   7, col:  1) .. (line:   7, col:  2)][GenericList.ItemBlock          'BITS : positive := 8\n\t'                                     at (line:   7, col:  3) .. (line:   8, col:  1)][GenericList.CloseBlock         ');'                                                           at (line:   8, col:  2) .. (line:   8, col:  3)][LinebreakBlock                                                                                at (line:   8, col:  4) .. (line:   8, col:  4)][IndentationBlock                length=1 (2)                                                  at (line:   9, col:  1) .. (line:   9, col:  1)][PortList.OpenBlock             'port ('                                                       at (line:   9, col:  2) .. (line:   9, col:  7)][LinebreakBlock                                                                                at (line:   9, col:  8) .. (line:   9, col:  8)][IndentationBlock                length=2 (4)                                                  at (line:  10, col:  1) .. (line:  10, col:  2)][PortList.ItemBlock             'Clock   : in  std_logic'                                      at (line:  10, col:  3) .. (line:  10, col: 26)][PortList.DelimiterBlock        ';'                                                            at (line:  10, col: 26) .. (line:  10, col: 26)][LinebreakBlock                                                                                at (line:  10, col: 27) .. (line:  10, col: 27)][IndentationBlock                length=2 (4)                                                  at (line:  11, col:  1) .. (line:  11, col:  2)][PortList.ItemBlock             'Output\t: out\tstd_logic_vector(BITS - 1 downto 0)\n\t'       at (line:  11, col:  3) .. (line:  12, col:  1)][PortList.CloseBlock            ');'                                                           at (line:  12, col:  2) .. (line:  12, col:  3)][LinebreakBlock                                                                                at (line:  12, col:  4) .. (line:  12, col:  4)][Entity.EndBlock                'end entity;'                                                  at (line:  13, col:  1) .. (line:  13, col: 11)][LinebreakBlock                                                                                at (line:  13, col: 12) .. (line:  13, col: 12)][EmptyLineBlock                                                                                at (line:  14, col:  1) .. (line:  14, col:  1)][Architecture.NameBlock         'architecture rtl of myEntity is'                              at (line:  15, col:  1) .. (line:  15, col: 32)][LinebreakBlock                                                                                at (line:  15, col: 32) .. (line:  15, col: 32)][IndentationBlock                length=1 (2)                                                  at (line:  16, col:  1) .. (line:  16, col:  1)][Constant.ConstantBlock         'constant const0 : integer := 5;'                              at (line:  16, col:  2) .. (line:  16, col: 32)][LinebreakBlock                                                                                at (line:  16, col: 33) .. (line:  16, col: 33)][EmptyLineBlock                                                                                at (line:  17, col:  6) .. (line:  17, col:  6)][IndentationBlock                length=1 (2)                                                  at (line:  18, col:  1) .. (line:  18, col:  1)][Process.OpenBlock              'process('                                                     at (line:  18, col:  2) .. (line:  18, col:  9)][SensitivityList.ItemBlock      'Clock'                                                        at (line:  18, col: 10) .. (line:  18, col: 15)][Process.OpenBlock2*            ')'                                                            at (line:  18, col: 15) .. (line:  18, col: 15)][LinebreakBlock                                                                                at (line:  18, col: 16) .. (line:  18, col: 16)]...

The following screenshot shows the resulting stream of blocks:

[outdated]The block stream can also be "opened" to show the stream of tokens within each block. This is shown in the next screenshot:

Step 3

The stream of blocks from step 2 is transformed into a stream of groups.

Step 4

One of many post processing steps could be to remove whitespaces, indentation and comment blocks. So a filter can be applied to remove these block types. Additionally, multiparted blocks (e.g. if a comment or linebreak was inserted between consecutive code sequences, which belong to one block) can be fused to one single block.

This screenshot shows the filtered results:

Example 2 - Simple_1

This is an input file:

-- Copryright 2016library IEEE;use     IEEE.std_logic_1164.all;use      IEEE.numeric_std.all;entitymyEntityisgeneric (    BITS :positive:=8  );port (    Clock   :instd_logic;    Reset   :instd_logic;Output  :outstd_logic_vector(BITS-1downto0)  );endentity;architecturertlofmyEntityisbeginendarchitecture;

This is the result stream:

And this is the filtered and fused result stream:

Contributors

License

This Python package (source code) licensed underApache License 2.0.
The accompanying documentation is licensed underCreative Commons - Attribution 4.0 (CC-BY 4.0).


SPDX-License-Identifier: Apache-2.0


[8]ページ先頭

©2009-2025 Movatter.jp