- Notifications
You must be signed in to change notification settings - Fork1
Pure-python reader for DAWGs created by dawgdic C++ library or DAWG Python extension. Fork ofhttps://github.com/pytries/DAWG-Python
License
pymorphy2-fork/DAWG-Python
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This pure-python package provides read-only access for files created bydawgdic C++ library andDAWG python package.
This package is not capable of creating DAWGs. It works with DAWGs builtbydawgdic C++ library orDAWG Python extension module. The mainpurpose of DAWG-Python is to provide access to DAWGs withoutrequiring compiled extensions. It is also quite fast under PyPy (seebenchmarks).
pip install DAWG2-PythonThe aim of DAWG2-Python is to be API- and binary-compatible withDAWG when it is possible.
First, you have to create a dawg usingDAWG module:
importdawgd=dawg.DAWG(data)d.save('words.dawg')
And then this dawg can be loaded without requiring C extensions:
importdawg_pythond=dawg_python.DAWG().load('words.dawg')
Please consultDAWG docs for detailedusage. Some features (like constructor parameters orsave method) areintentionally unsupported.
Benchmark results (100k unicode words, integer values (lengths of thewords), PyPy 1.9, macbook air i5 1.8 Ghz):
dict __getitem__ (hits): 11.090M ops/secDAWG __getitem__ (hits): not supportedBytesDAWG __getitem__ (hits): 0.493M ops/secRecordDAWG __getitem__ (hits): 0.376M ops/secdict get() (hits): 10.127M ops/secDAWG get() (hits): not supportedBytesDAWG get() (hits): 0.481M ops/secRecordDAWG get() (hits): 0.402M ops/secdict get() (misses): 14.885M ops/secDAWG get() (misses): not supportedBytesDAWG get() (misses): 1.259M ops/secRecordDAWG get() (misses): 1.337M ops/secdict __contains__ (hits): 11.100M ops/secDAWG __contains__ (hits): 1.317M ops/secBytesDAWG __contains__ (hits): 1.107M ops/secRecordDAWG __contains__ (hits): 1.095M ops/secdict __contains__ (misses): 10.567M ops/secDAWG __contains__ (misses): 1.902M ops/secBytesDAWG __contains__ (misses): 1.873M ops/secRecordDAWG __contains__ (misses): 1.862M ops/secdict items(): 44.401 ops/secDAWG items(): not supportedBytesDAWG items(): 3.226 ops/secRecordDAWG items(): 2.987 ops/secdict keys(): 426.250 ops/secDAWG keys(): not supportedBytesDAWG keys(): 6.050 ops/secRecordDAWG keys(): 6.363 ops/secDAWG.prefixes (hits): 0.756M ops/secDAWG.prefixes (mixed): 1.965M ops/secDAWG.prefixes (misses): 1.773M ops/secRecordDAWG.keys(prefix="xxx"), avg_len(res)==415: 1.429K ops/secRecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17: 36.994K ops/secRecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3: 121.897K ops/secRecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 265.015K ops/secRecordDAWG.keys(prefix="xxx"), NON_EXISTING: 2450.898K ops/secUnder CPython expect it to be about 50x slower. Memory consumption ofDAWG-Python should be the same as ofDAWG.
- This package is not capable of creating DAWGs;
- all the limitations ofDAWG apply.
Contributions are welcome!
- Development happens at GitHub:https://github.com/pymorphy2-fork/DAWG-Python
- Issue tracker:https://github.com/pymorphy2-fork/DAWG-Python/issues
Feel free to submit ideas, bugs or pull requests.
Make surepytest is installed and run
$ pytest .from the source checkout. Tests should pass under python 3.8, 3.9, 3.10, 3.11 and PyPy3 >= 7.3.
In order to run benchmarks, type
$ pypy3 -m bench.speedThis runs benchmarks under PyPy (they are about 50x slower underCPython).
- Mikhail Korobov <kmike84@gmail.com>
- @bt2901
- @insolor
The algorithms are fromdawgdicC++ library by Susumu Yata & contributors.
This package is licensed under MIT License.
About
Pure-python reader for DAWGs created by dawgdic C++ library or DAWG Python extension. Fork ofhttps://github.com/pytries/DAWG-Python
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors4
Uh oh!
There was an error while loading.Please reload this page.