Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
NotificationsYou must be signed in to change notification settings

daac-tools/python-daachorse

Repository files navigation

daachorse is a fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.This is a Python wrapper.

PyPIBuild StatusDocumentation Status

Installation

Install pre-built package from PyPI

Run the following command:

$ pip install daachorse

Build from source

You need to install the Rust compiler followingthe documentation beforehand.daachorse usespyproject.toml, so you also need to upgrade pip to version 19 or later.

$ pip install --upgrade pip

After setting up the environment, you can install daachorse as follows:

$ pip install git+https://github.com/daac-tools/python-daachorse

Example usage

Daachorse contains some search options,ranging from basic matching with the Aho-Corasick algorithm to trickier matching.All of them will run very fast based on the double-array data structure andcan be easily plugged into your application as shown below.

Finding overlapped occurrences

To search for all occurrences of registered patternsthat allow for positional overlap in the input text,usefind_overlapping(). When you instantiate a new automaton,unique identifiers are assigned to each pattern in the input order.The match result has the character positions of the occurrence and its identifier.

>>importdaachorse>>patterns= ['bcd','ab','a']>>pma=daachorse.Automaton(patterns)>>pma.find_overlapping('abcd')[(0,1,2), (0,2,1), (1,4,0)]

Finding non-overlapped occurrences with standard matching

If you do not want to allow positional overlap, usefind() instead.It performs the search on the Aho-Corasick automatonand reports patterns first found in each iteration.

>>importdaachorse>>patterns= ['bcd','ab','a']>>pma=daachorse.Automaton(patterns)>>pma.find('abcd')[(0,1,2), (1,4,0)]

Finding non-overlapped occurrences with longest matching

If you want to search for the longest pattern without positional overlap in each iteration,useMATCH_KIND_LEFTMOST_LONGEST in the construction.

>>importdaachorse>>patterns= ['ab','a','abcd']>>pma=daachorse.Automaton(patterns,daachorse.MATCH_KIND_LEFTMOST_LONGEST)>>pma.find('abcd')[(0,4,2)]

Finding non-overlapped occurrences with leftmost-first matching

If you want to find the the earliest registered patternamong ones starting from the search position,useMATCH_KIND_LEFTMOST_FIRST.

This is so-calledthe leftmost first match, a bit tricky search option.For example, in the following code,ab is reported because it is the earliest registered one.

>>importdaachorse>>patterns= ['ab','a','abcd']>>pma=daachorse.Automaton(patterns,daachorse.MATCH_KIND_LEFTMOST_FIRST)>>pma.find('abcd')[(0,2,0)]

License

Licensed under either of

at your option.

About

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp