- Notifications
You must be signed in to change notification settings - Fork3
Wavelet Matrix/Tree succinct data structure for full text search (based on shellinford C++ library)
License
NotificationsYou must be signed in to change notification settings
ikegami-yukino/shellinford-python
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Shellinford is an implementation of a Wavelet Matrix/Tree succinct data structure for document retrieval.
It is based onshellinford C++ library.
NOTE: This module requires C++11 compiler
$ pip install shellinford
>>>importshellinford>>>fm=shellinford.FMIndex()
- shellinford.Shellinford([use_wavelet_tree=True, filename=None])
- When given a filename, Shellinford loads FM-index data from the file
>>>fm.build(['Milky Holmes','Sherlock "Sheryl" Shellingford','Milky'],'milky.fm')
- build([docs, filename])
- When given a filename, Shellinford stores FM-index data to the file
>>>fordocinfm.search('Milky'):>>>print('doc_id:',doc.doc_id)>>>print('count:',doc.count)>>>print('text:',doc.text)doc_id:0count: [1]text:MilkyHolmesdoc_id:2count: [1]text:Milky>>>fordocinfm.search(['Milky','Holmes']):>>>print('doc_id:',doc.doc_id)>>>print('count:',doc.count)>>>print('text:',doc.text)doc_id:1count: [1]text:MilkyHolmes
- search(query, [_or=False, ignores=[]])
- If _or = True, then "OR" search is executed, else "AND" search
- Given ignores, "NOT" search is also executed
- NOTE: The search function is available after FM-index is built or loaded
>>>fm.count('Milky'):2>>>fm.count(['Milky','Holmes']):1
- count(query, [_or=False])
- If _or = True, then "OR" search is executed, else "AND" search
- NOTE: The count function is available after FM-index is built or loaded
- This function is slightly faster than the search function
>>>fm.push_back('Baritsu')
- push_back(doc)
- NOTE: A document added by this method is not available to search until build
>>>fm.read('milky_holmes.fm')
- read(path)
>>>fm.write('milky_holmes.fm')
- write(path)
>>>'baritsu'infm
- Wrapper code is licensed under the New BSD License.
- Bundledshellinford C++ library (c) 2012 echizen_tm is licensed under the New BSD License.
About
Wavelet Matrix/Tree succinct data structure for full text search (based on shellinford C++ library)
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.