Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pure-python reader for DAWGs created by dawgdic C++ library or DAWG Python extension. Fork ofhttps://github.com/pytries/DAWG-Python

License

NotificationsYou must be signed in to change notification settings

pymorphy2-fork/DAWG-Python

Repository files navigation

Python testsCoverage Status

This pure-python package provides read-only access for files created bydawgdic C++ library andDAWG python package.

This package is not capable of creating DAWGs. It works with DAWGs builtbydawgdic C++ library orDAWG Python extension module. The mainpurpose of DAWG-Python is to provide access to DAWGs withoutrequiring compiled extensions. It is also quite fast under PyPy (seebenchmarks).

Installation

pip install DAWG2-Python

Usage

The aim of DAWG2-Python is to be API- and binary-compatible withDAWG when it is possible.

First, you have to create a dawg usingDAWG module:

importdawgd=dawg.DAWG(data)d.save('words.dawg')

And then this dawg can be loaded without requiring C extensions:

importdawg_pythond=dawg_python.DAWG().load('words.dawg')

Please consultDAWG docs for detailedusage. Some features (like constructor parameters orsave method) areintentionally unsupported.

Benchmarks

Benchmark results (100k unicode words, integer values (lengths of thewords), PyPy 1.9, macbook air i5 1.8 Ghz):

dict __getitem__ (hits):        11.090M ops/secDAWG __getitem__ (hits):        not supportedBytesDAWG __getitem__ (hits):   0.493M ops/secRecordDAWG __getitem__ (hits):  0.376M ops/secdict get() (hits):              10.127M ops/secDAWG get() (hits):              not supportedBytesDAWG get() (hits):         0.481M ops/secRecordDAWG get() (hits):        0.402M ops/secdict get() (misses):            14.885M ops/secDAWG get() (misses):            not supportedBytesDAWG get() (misses):       1.259M ops/secRecordDAWG get() (misses):      1.337M ops/secdict __contains__ (hits):           11.100M ops/secDAWG __contains__ (hits):           1.317M ops/secBytesDAWG __contains__ (hits):      1.107M ops/secRecordDAWG __contains__ (hits):     1.095M ops/secdict __contains__ (misses):         10.567M ops/secDAWG __contains__ (misses):         1.902M ops/secBytesDAWG __contains__ (misses):    1.873M ops/secRecordDAWG __contains__ (misses):   1.862M ops/secdict items():           44.401 ops/secDAWG items():           not supportedBytesDAWG items():      3.226 ops/secRecordDAWG items():     2.987 ops/secdict keys():            426.250 ops/secDAWG keys():            not supportedBytesDAWG keys():       6.050 ops/secRecordDAWG keys():      6.363 ops/secDAWG.prefixes (hits):    0.756M ops/secDAWG.prefixes (mixed):   1.965M ops/secDAWG.prefixes (misses):  1.773M ops/secRecordDAWG.keys(prefix="xxx"), avg_len(res)==415:       1.429K ops/secRecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17:      36.994K ops/secRecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3:    121.897K ops/secRecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 265.015K ops/secRecordDAWG.keys(prefix="xxx"), NON_EXISTING:            2450.898K ops/sec

Under CPython expect it to be about 50x slower. Memory consumption ofDAWG-Python should be the same as ofDAWG.

Current limitations

  • This package is not capable of creating DAWGs;
  • all the limitations ofDAWG apply.

Contributions are welcome!

Contributing

Feel free to submit ideas, bugs or pull requests.

Running tests and benchmarks

Make surepytest is installed and run

$ pytest .

from the source checkout. Tests should pass under python 3.8, 3.9, 3.10, 3.11 and PyPy3 >= 7.3.

In order to run benchmarks, type

$ pypy3 -m bench.speed

This runs benchmarks under PyPy (they are about 50x slower underCPython).

Authors & Contributors

The algorithms are fromdawgdicC++ library by Susumu Yata & contributors.

License

This package is licensed under MIT License.

About

Pure-python reader for DAWGs created by dawgdic C++ library or DAWG Python extension. Fork ofhttps://github.com/pytries/DAWG-Python

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors4

  •  
  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp