- Notifications
You must be signed in to change notification settings - Fork7
eerimoq/textparser
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A text parser written in the Python language.
The project has one goal, speed! See the benchmark below more details.
Project homepage:https://github.com/eerimoq/textparser
Documentation:http://textparser.readthedocs.org/en/latest
- ThanksPyParsing for a user friendly interface. Many of
textparser
's class names are taken from this project.
pipinstalltextparser
TheHello World example parses the stringHello, World!
andoutputs its parse tree['Hello', ',', 'World', '!']
.
The script:
importtextparserfromtextparserimportSequenceclassParser(textparser.Parser):deftoken_specs(self):return [ ('SKIP',r'[ \r\n\t]+'), ('WORD',r'\w+'), ('EMARK','!',r'!'), ('COMMA',',',r','), ('MISMATCH',r'.') ]defgrammar(self):returnSequence('WORD',',','WORD','!')tree=Parser().parse('Hello, World!')print('Tree:',tree)
Script execution:
$ env PYTHONPATH=. python3 examples/hello_world.pyTree: ['Hello', ',', 'World', '!']
Abenchmark comparing the speed of 10 JSON parsers, parsing a276kb file.
$ env PYTHONPATH=. python3 examples/benchmarks/json/speed.pyParsed 'examples/benchmarks/json/data.json' 1 time(s) in:PACKAGE SECONDS RATIO VERSIONtextparser 0.10 100% 0.21.1parsimonious 0.17 169% unknownlark (LALR) 0.27 267% 0.7.0funcparserlib 0.34 340% unknowntextx 0.54 546% 1.8.0pyparsing 0.68 684% 2.4.0pyleri 0.88 886% 1.2.2parsy 0.92 925% 1.2.0parsita 2.28 2286% unknownlark (Earley) 2.34 2348% 0.7.0
NOTE 1: The parsers are not necessarily optimized forspeed. Optimizing them will likely affect the measurements.
NOTE 2: The structure of the resulting parse trees varies andadditional processing may be required to make them fit the userapplication.
NOTE 3: Only JSON parsers are compared. Parsing other languages maygive vastly different results.
Fork the repository.
Implement the new feature or bug fix.
Implement test case(s) to ensure that future changes do not breaklegacy.
Run the tests.
python3 -m unittest
Create a pull request.