- Notifications
You must be signed in to change notification settings - Fork15
Streaming based VHDL parser.
License
Paebbels/pyVHDLParser
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a token-stream based parser for VHDL-2008.
This project requires Python 3.8+.
- Parsing
- slice an input document intotokens and textblocks which are categorized
- preserve case, whitespace and comments
- recover on parsing errors
- good error reporting / throw exceptions
- Fast Processing
- multi-pass parsing and analysis
- delay analysis if not needed at current pass
- link tokens and blocks for fast-forward scanning
- Generic VHDL Language Model
- Assemble a document-object-model (Code-DOM)
- Provide an API for code introspection
- generate documentation by using the fast-forward scanner
- generate a document/language model by using the grouped text-block scanner
- extract compile orders and other dependency graphs
- generate highlighted syntax
- re-annotate documenting comments to their objects for doc extraction
- slice an input document intotokens
- assemble tokens to textblocks which are categorized
- assemble text blocks for fast-forward scanning intogroups
- translate groups into a document-object-model (DOM)
- provide a generic VHDL language model
A Sphinx language plugin for VHDL
TODO: Move the following documentation to ReadTheDocs and replace it with a more lightweight version.
This is an input file:
-- Copryright 2016library IEEE;use IEEE.std_logic_1164.all;entitymyEntityisgeneric ( BITS :positive:=8 );port ( Clock :instd_logic;Output :outstd_logic_vector(BITS-1downto0) );endentity;architecturertlofmyEntityisconstant const0 :integer:=5;beginprocess(Clock)beginendprocess;endarchitecture;library IEEE, PoC;use PoC.Utils.all, PoC.Common.all;packagepkg0isfunctionfunc0(a :integer)returnstring;endpackage;packagebodyComponentsisfunctionfunc0(a :integer)returnstringisprocedure proc0isbeginendprocedure;beginendfunctionendpackagebody;
The input file (stream of characters) is translated into stream of basic tokens:
StartOfDocumentToken
LinebreakToken
SpaceToken
IndentationToken
WordToken
CharacterToken
FusedCharacterToken
- CommentToken
SingleLineCommentToken
MultiLineCommentToken
EndOfDocumentToken
The stream looks like this:
<StartOfDocumentToken><SLCommentToken '-- Copryright 2016\n' ................ at 1:1><WordToken 'library' ............................. at 2:1><SpaceToken ' ' ................................... at 2:8><WordToken 'IEEE' ................................ at 2:9><CharacterToken ';' ................................... at 2:13><LinebreakToken ---------------------------------------- at 2:14><WordToken 'use' ................................. at 3:1><SpaceToken ' ' ............................... at 3:4><WordToken 'IEEE' ................................ at 3:9><CharacterToken '.' ................................... at 3:13><WordToken 'std_logic_1164' ...................... at 3:14><CharacterToken '.' ................................... at 3:28><WordToken 'all' ................................. at 3:29><CharacterToken ';' ................................... at 3:32><LinebreakToken ---------------------------------------- at 3:33><LinebreakToken ---------------------------------------- at 4:1><WordToken 'entity' .............................. at 5:1><SpaceToken ' ' ................................... at 5:7><WordToken 'myEntity' ............................ at 5:8><SpaceToken ' ' ................................... at 5:16><WordToken 'is' .................................. at 5:17><LinebreakToken ---------------------------------------- at 5:19><IndentToken '\t' .................................. at 6:1><WordToken 'generic' ............................. at 6:2><SpaceToken ' ' ................................... at 6:9><CharacterToken '(' ................................... at 6:10><LinebreakToken ---------------------------------------- at 6:11><IndentToken '\t\t' ................................ at 7:1><WordToken 'BITS' ................................ at 7:3><SpaceToken ' ' ................................... at 7:7><CharacterToken ':' ................................... at 7:8><SpaceToken ' ' ................................... at 7:8><WordToken 'positive' ............................ at 7:10><SpaceToken ' ' ................................... at 7:18><FusedCharToken ':=' .................................. at 7:19><SpaceToken ' ' ................................... at 7:21><WordToken '8' ................................... at 7:22><LinebreakToken ---------------------------------------- at 7:23><IndentToken '\t' .................................. at 8:1><CharacterToken ')' ................................... at 8:2><CharacterToken ';' ................................... at 8:3><LinebreakToken ---------------------------------------- at 8:4><IndentToken '\t' .................................. at 9:1><WordToken 'port' ................................ at 9:2><SpaceToken ' ' ................................... at 9:6><CharacterToken '(' ................................... at 9:7><LinebreakToken ---------------------------------------- at 9:8><IndentToken '\t\t' ................................ at 10:1><WordToken 'Clock' ............................... at 10:3><SpaceToken ' ' ................................. at 10:8><CharacterToken ':' ................................... at 10:11><SpaceToken ' ' ................................... at 10:11><WordToken 'in' .................................. at 10:13><SpaceToken ' ' .................................. at 10:15><WordToken 'std_logic' ........................... at 10:17><CharacterToken ';' ................................... at 10:26><LinebreakToken ---------------------------------------- at 10:27><IndentToken '\t\t' ................................ at 11:1><WordToken 'Output' .............................. at 11:3><SpaceToken ' ' ................................... at 11:9><CharacterToken ':' ................................... at 11:10><SpaceToken ' ' ................................... at 11:10><WordToken 'out' ................................. at 11:12><SpaceToken ' ' ................................... at 11:15><WordToken 'std_logic_vector' .................... at 11:16><CharacterToken '(' ................................... at 11:32><WordToken 'BITS' ................................ at 11:33><SpaceToken ' ' ................................... at 11:37><CharacterToken '-' ................................... at 11:38><SpaceToken ' ' ................................... at 11:38><WordToken '1' ................................... at 11:40><SpaceToken ' ' ................................... at 11:41><WordToken 'downto' .............................. at 11:42><SpaceToken ' ' ................................... at 11:48><WordToken '0' ................................... at 11:49><CharacterToken ')' ................................... at 11:50><LinebreakToken ---------------------------------------- at 11:51><IndentToken '\t' .................................. at 12:1><CharacterToken ')' ................................... at 12:2><CharacterToken ';' ................................... at 12:3><LinebreakToken ---------------------------------------- at 12:4><WordToken 'end' ................................. at 13:1><SpaceToken ' ' ................................... at 13:4><WordToken 'entity' .............................. at 13:5><CharacterToken ';' ................................... at 13:11><LinebreakToken ---------------------------------------- at 13:12><LinebreakToken ---------------------------------------- at 14:1><WordToken 'architecture' ........................ at 15:1><SpaceToken ' ' ................................... at 15:13><WordToken 'rtl' ................................. at 15:14><SpaceToken ' ' ................................... at 15:17><WordToken 'of' .................................. at 15:18><SpaceToken ' ' ................................... at 15:20><WordToken 'myEntity' ............................ at 15:21><SpaceToken ' ' ................................... at 15:29><WordToken 'is' .................................. at 15:30><LinebreakToken ---------------------------------------- at 15:32><IndentToken '\t' .................................. at 16:1><WordToken 'constant' ............................ at 16:2><SpaceToken ' ' ................................... at 16:10><WordToken 'const0' .............................. at 16:11><SpaceToken ' ' ................................... at 16:17><CharacterToken ':' ................................... at 16:18><SpaceToken ' ' ................................... at 16:18><WordToken 'integer' ............................. at 16:20><SpaceToken ' ' ................................... at 16:27><FusedCharToken ':=' .................................. at 16:28><SpaceToken ' ' ................................... at 16:30><WordToken '5' ................................... at 16:31><CharacterToken ';' ................................... at 16:32><LinebreakToken ---------------------------------------- at 16:33><WordToken 'begin' ............................... at 17:1><LinebreakToken ---------------------------------------- at 17:6><IndentToken '\t' .................................. at 18:1><WordToken 'process' ............................. at 18:2><CharacterToken '(' ................................... at 18:9><WordToken 'Clock' ............................... at 18:10><CharacterToken ')' ................................... at 18:15><LinebreakToken ---------------------------------------- at 18:16><IndentToken '\t' .................................. at 19:1><WordToken 'begin' ............................... at 19:2><LinebreakToken ---------------------------------------- at 19:7><IndentToken '\t' .................................. at 20:1><WordToken 'end' ................................. at 20:2><SpaceToken ' ' ................................... at 20:5><WordToken 'process' ............................. at 20:6><CharacterToken ';' ................................... at 20:13><LinebreakToken ---------------------------------------- at 20:14><WordToken 'end' ................................. at 21:1><SpaceToken ' ' ................................... at 21:4><WordToken 'architecture' ........................ at 21:5><CharacterToken ';' ................................... at 21:17><LinebreakToken ---------------------------------------- at 21:18><LinebreakToken ---------------------------------------- at 22:1><WordToken 'library' ............................. at 23:1><SpaceToken ' ' ................................... at 23:8><WordToken 'IEEE' ................................ at 23:9><CharacterToken ',' ................................... at 23:13><SpaceToken ' ' ................................... at 23:14><WordToken 'PoC' ................................. at 23:15><CharacterToken ';' ................................... at 23:18><LinebreakToken ---------------------------------------- at 23:19><WordToken 'use' ................................. at 24:1><SpaceToken ' ' ............................... at 24:4><WordToken 'PoC' ................................. at 24:9><CharacterToken '.' ................................... at 24:12><WordToken 'Utils' ............................... at 24:13><CharacterToken '.' ................................... at 24:18><WordToken 'all' ................................. at 24:19><CharacterToken ',' ................................... at 24:22><SpaceToken ' ' ................................... at 24:23><WordToken 'PoC' ................................. at 24:24><CharacterToken '.' ................................... at 24:27><WordToken 'Common' .............................. at 24:28><CharacterToken '.' ................................... at 24:34><WordToken 'all' ................................. at 24:35><CharacterToken ';' ................................... at 24:38><LinebreakToken ---------------------------------------- at 24:39><LinebreakToken ---------------------------------------- at 25:1><WordToken 'package' ............................. at 26:1><SpaceToken ' ' ................................... at 26:8><WordToken 'pkg0' ................................ at 26:9><SpaceToken ' ' ................................... at 26:13><WordToken 'is' .................................. at 26:14><LinebreakToken ---------------------------------------- at 26:16><IndentToken '\t' .................................. at 27:1><WordToken 'function' ............................ at 27:2><SpaceToken ' ' ................................... at 27:10><WordToken 'func0' ............................... at 27:11><CharacterToken '(' ................................... at 27:16><WordToken 'a' ................................... at 27:17><SpaceToken ' ' ................................... at 27:18><CharacterToken ':' ................................... at 27:19><SpaceToken ' ' ................................... at 27:19><WordToken 'integer' ............................. at 27:21><CharacterToken ')' ................................... at 27:28><SpaceToken ' ' ................................... at 27:29><WordToken 'return' .............................. at 27:30><SpaceToken ' ' ................................... at 27:36><WordToken 'string' .............................. at 27:37><CharacterToken ';' ................................... at 27:43><LinebreakToken ---------------------------------------- at 27:44><WordToken 'end' ................................. at 28:1><SpaceToken ' ' ................................... at 28:4><WordToken 'package' ............................. at 28:5><CharacterToken ';' ................................... at 28:12><LinebreakToken ---------------------------------------- at 28:13><LinebreakToken ---------------------------------------- at 29:1><WordToken 'package' ............................. at 30:1><SpaceToken ' ' ................................... at 30:8><WordToken 'body' ................................ at 30:9><SpaceToken ' ' ................................... at 30:13><WordToken 'Components' .......................... at 30:14><SpaceToken ' ' ................................... at 30:24><WordToken 'is' .................................. at 30:25><LinebreakToken ---------------------------------------- at 30:27><IndentToken '\t' .................................. at 31:1><WordToken 'function' ............................ at 31:2><SpaceToken ' ' ................................... at 31:10><WordToken 'func0' ............................... at 31:11><CharacterToken '(' ................................... at 31:16><WordToken 'a' ................................... at 31:17><SpaceToken ' ' ................................... at 31:18><CharacterToken ':' ................................... at 31:19><SpaceToken ' ' ................................... at 31:19><WordToken 'integer' ............................. at 31:21><CharacterToken ')' ................................... at 31:28><SpaceToken ' ' ................................... at 31:29><WordToken 'return' .............................. at 31:30><SpaceToken ' ' ................................... at 31:36><WordToken 'string' .............................. at 31:37><SpaceToken ' ' ................................... at 31:43><WordToken 'is' .................................. at 31:44><LinebreakToken ---------------------------------------- at 31:46><IndentToken '\t\t' ................................ at 32:1><WordToken 'procedure' ........................... at 32:3><SpaceToken ' ' ................................... at 32:12><WordToken 'proc0' ............................... at 32:13><SpaceToken ' ' ................................... at 32:18><WordToken 'is' .................................. at 32:19><LinebreakToken ---------------------------------------- at 32:21><IndentToken '\t\t' ................................ at 33:1><WordToken 'begin' ............................... at 33:3><LinebreakToken ---------------------------------------- at 33:8><IndentToken '\t\t' ................................ at 34:1><WordToken 'end' ................................. at 34:3><SpaceToken ' ' ................................... at 34:6><WordToken 'procedure' ........................... at 34:7><CharacterToken ';' ................................... at 34:16><LinebreakToken ---------------------------------------- at 34:17><IndentToken '\t' .................................. at 35:1><WordToken 'begin' ............................... at 35:2><LinebreakToken ---------------------------------------- at 35:7><IndentToken '\t' .................................. at 36:1><WordToken 'end' ................................. at 36:2><SpaceToken ' ' ................................... at 36:5><WordToken 'function' ............................ at 36:6><LinebreakToken ---------------------------------------- at 36:14><WordToken 'end' ................................. at 37:1><SpaceToken ' ' ................................... at 37:4><WordToken 'package' ............................. at 37:5><SpaceToken ' ' ................................... at 37:12><WordToken 'body' ................................ at 37:13><CharacterToken ';' ................................... at 37:17><LinebreakToken ---------------------------------------- at 37:18>
The token stream from step 1 is translated into typed tokens likeDelimiterToken
(:),EndToken
(;) or subtypes ofKeywordToken
.These tokens are then grouped into blocks.
The example generates:
[StartOfDocumentBlock][Blocks.CommentBlock '-- Copryright 2016\n' at (line: 1, col: 1) .. (line: 1, col: 19)][LibraryStatement.LibraryBlock 'library ' at (line: 2, col: 1) .. (line: 2, col: 8)][LibraryStatement.LibraryNameBlock 'IEEE' at (line: 2, col: 9) .. (line: 2, col: 13)][LibraryStatement.LibraryEndBlock ';' at (line: 2, col: 13) .. (line: 2, col: 13)][LinebreakBlock at (line: 2, col: 14) .. (line: 2, col: 14)][Use.UseBlock 'use ' at (line: 3, col: 1) .. (line: 3, col: 8)][Use.UseNameBlock 'IEEE.std_logic_1164.all' at (line: 3, col: 9) .. (line: 3, col: 32)][Use.UseEndBlock ';' at (line: 3, col: 32) .. (line: 3, col: 32)][LinebreakBlock at (line: 3, col: 33) .. (line: 3, col: 33)][EmptyLineBlock at (line: 4, col: 1) .. (line: 4, col: 1)][Entity.NameBlock 'entity myEntity is' at (line: 5, col: 1) .. (line: 5, col: 19)][LinebreakBlock at (line: 5, col: 19) .. (line: 5, col: 19)][IndentationBlock length=1 (2) at (line: 6, col: 1) .. (line: 6, col: 1)][GenericList.OpenBlock 'generic (' at (line: 6, col: 2) .. (line: 6, col: 10)][LinebreakBlock at (line: 6, col: 11) .. (line: 6, col: 11)][IndentationBlock length=2 (4) at (line: 7, col: 1) .. (line: 7, col: 2)][GenericList.ItemBlock 'BITS : positive := 8\n\t' at (line: 7, col: 3) .. (line: 8, col: 1)][GenericList.CloseBlock ');' at (line: 8, col: 2) .. (line: 8, col: 3)][LinebreakBlock at (line: 8, col: 4) .. (line: 8, col: 4)][IndentationBlock length=1 (2) at (line: 9, col: 1) .. (line: 9, col: 1)][PortList.OpenBlock 'port (' at (line: 9, col: 2) .. (line: 9, col: 7)][LinebreakBlock at (line: 9, col: 8) .. (line: 9, col: 8)][IndentationBlock length=2 (4) at (line: 10, col: 1) .. (line: 10, col: 2)][PortList.ItemBlock 'Clock : in std_logic' at (line: 10, col: 3) .. (line: 10, col: 26)][PortList.DelimiterBlock ';' at (line: 10, col: 26) .. (line: 10, col: 26)][LinebreakBlock at (line: 10, col: 27) .. (line: 10, col: 27)][IndentationBlock length=2 (4) at (line: 11, col: 1) .. (line: 11, col: 2)][PortList.ItemBlock 'Output\t: out\tstd_logic_vector(BITS - 1 downto 0)\n\t' at (line: 11, col: 3) .. (line: 12, col: 1)][PortList.CloseBlock ');' at (line: 12, col: 2) .. (line: 12, col: 3)][LinebreakBlock at (line: 12, col: 4) .. (line: 12, col: 4)][Entity.EndBlock 'end entity;' at (line: 13, col: 1) .. (line: 13, col: 11)][LinebreakBlock at (line: 13, col: 12) .. (line: 13, col: 12)][EmptyLineBlock at (line: 14, col: 1) .. (line: 14, col: 1)][Architecture.NameBlock 'architecture rtl of myEntity is' at (line: 15, col: 1) .. (line: 15, col: 32)][LinebreakBlock at (line: 15, col: 32) .. (line: 15, col: 32)][IndentationBlock length=1 (2) at (line: 16, col: 1) .. (line: 16, col: 1)][Constant.ConstantBlock 'constant const0 : integer := 5;' at (line: 16, col: 2) .. (line: 16, col: 32)][LinebreakBlock at (line: 16, col: 33) .. (line: 16, col: 33)][EmptyLineBlock at (line: 17, col: 6) .. (line: 17, col: 6)][IndentationBlock length=1 (2) at (line: 18, col: 1) .. (line: 18, col: 1)][Process.OpenBlock 'process(' at (line: 18, col: 2) .. (line: 18, col: 9)][SensitivityList.ItemBlock 'Clock' at (line: 18, col: 10) .. (line: 18, col: 15)][Process.OpenBlock2* ')' at (line: 18, col: 15) .. (line: 18, col: 15)][LinebreakBlock at (line: 18, col: 16) .. (line: 18, col: 16)]...
The following screenshot shows the resulting stream of blocks:
[outdated]The block stream can also be "opened" to show the stream of tokens within each block. This is shown in the next screenshot:
The stream of blocks from step 2 is transformed into a stream of groups.
One of many post processing steps could be to remove whitespaces, indentation and comment blocks. So a filter can be applied to remove these block types. Additionally, multiparted blocks (e.g. if a comment or linebreak was inserted between consecutive code sequences, which belong to one block) can be fused to one single block.
This screenshot shows the filtered results:
This is an input file:
-- Copryright 2016library IEEE;use IEEE.std_logic_1164.all;use IEEE.numeric_std.all;entitymyEntityisgeneric ( BITS :positive:=8 );port ( Clock :instd_logic; Reset :instd_logic;Output :outstd_logic_vector(BITS-1downto0) );endentity;architecturertlofmyEntityisbeginendarchitecture;
And this is the filtered and fused result stream:
- Patrick Lehmann (Maintainer)
- and more...
This Python package (source code) licensed underApache License 2.0.
The accompanying documentation is licensed underCreative Commons - Attribution 4.0 (CC-BY 4.0).
SPDX-License-Identifier: Apache-2.0
About
Streaming based VHDL parser.