Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Wavelet Matrix/Tree succinct data structure for full text search (based on shellinford C++ library)

License

NotificationsYou must be signed in to change notification settings

ikegami-yukino/shellinford-python

Repository files navigation

shellinford

travis-ci.orgcoveralls.iopyversionlatest versionlicense

Shellinford is an implementation of a Wavelet Matrix/Tree succinct data structure for document retrieval.

It is based onshellinford C++ library.

NOTE: This module requires C++11 compiler

Installation

$ pip install shellinford

Usage

Create a new FM-index instance

>>>importshellinford>>>fm=shellinford.FMIndex()
  • shellinford.Shellinford([use_wavelet_tree=True, filename=None])
    • When given a filename, Shellinford loads FM-index data from the file

Build FM-index

>>>fm.build(['Milky Holmes','Sherlock "Sheryl" Shellingford','Milky'],'milky.fm')
  • build([docs, filename])
    • When given a filename, Shellinford stores FM-index data to the file

Search word from FM-index

>>>fordocinfm.search('Milky'):>>>print('doc_id:',doc.doc_id)>>>print('count:',doc.count)>>>print('text:',doc.text)doc_id:0count: [1]text:MilkyHolmesdoc_id:2count: [1]text:Milky>>>fordocinfm.search(['Milky','Holmes']):>>>print('doc_id:',doc.doc_id)>>>print('count:',doc.count)>>>print('text:',doc.text)doc_id:1count: [1]text:MilkyHolmes
  • search(query, [_or=False, ignores=[]])
    • If _or = True, then "OR" search is executed, else "AND" search
    • Given ignores, "NOT" search is also executed
    • NOTE: The search function is available after FM-index is built or loaded

Count word from FM-index

>>>fm.count('Milky'):2>>>fm.count(['Milky','Holmes']):1
  • count(query, [_or=False])
    • If _or = True, then "OR" search is executed, else "AND" search
    • NOTE: The count function is available after FM-index is built or loaded
    • This function is slightly faster than the search function

Add a document

>>>fm.push_back('Baritsu')
  • push_back(doc)
    • NOTE: A document added by this method is not available to search until build

Read FM-index from a binary file

>>>fm.read('milky_holmes.fm')
  • read(path)

Write FM-index binary to a file

>>>fm.write('milky_holmes.fm')
  • write(path)

Check Whether FM-Index contains string

>>>'baritsu'infm

License

  • Wrapper code is licensed under the New BSD License.
  • Bundledshellinford C++ library (c) 2012 echizen_tm is licensed under the New BSD License.

About

Wavelet Matrix/Tree succinct data structure for full text search (based on shellinford C++ library)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp