- Notifications
You must be signed in to change notification settings - Fork16
A Python implementation of Lunr.js 🌖
License
yeraydiazdiaz/lunr.py
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A Python implementation ofLunr.js byOliver Nightingale.
A bit like Solr, but much smaller and not as bright.
This Python version of Lunr.js aims to bring the simple and powerful full text searchcapabilities into Python guaranteeing results as close as the originalimplementation as possible.
Lunr is a simple full text search solution for situations where deploying a fullscale solution like Elasticsearch isn't possible, viable or you're simply prototyping.Lunr parses a set of documents and creates an inverted index for quick full textsearches in the same way other more complicated solution.
The trade-off is that Lunr keeps the inverted index in memory and requires youto recreate or read the index at the start of your application.
A core objective of Lunr.py is to provideinteroperability with the JavaScript version.
An example can be found in theMkDocs documentation library.MkDocs produces a set of documents from the pages of the documentation and usesLunr.js in the frontend to power its built-in searchingengine. This set of documents is in the form of a JSON file which needs to befetched and parsed by Lunr.js to create the inverted index at startup of your application.
While this is not a problem for most sites, depending on the size of your documentset, this can take some time.
Lunr.py provides a backend solution, allowing you to parse the documents in Pythonof time and create a serialized Lunr.js index you can pass have the browserversion read, minimizing start up time of your application.
Each version of lunr.pytargets a specific version of lunr.jsand produces the same results for anon-trivial corpus of documents.
pip install lunr
An optional and experimental support for other languages thanks to theNatural Language Toolkit stemmers is also available viapip install lunr[languages]
. The usage of the language feature is subject toNTLK corpus licensing clauses.
Please refer to thedocumentation page on languagesfor more information.
First, you'll need a list of dicts representing the documents you want to search on.These documents must have a unique field which will serve as a reference and aseries of fields you'd like to search on.
Lunr provides a conveniencelunr
function to quickly index this set of documents:
>>>fromlunrimportlunr>>>>>>documents= [{...'id':'a',...'title':'Mr. Green kills Colonel Mustard',...'body':'Mr. Green killed Colonel Mustard in the study with the candlestick.',... }, {...'id':'b',...'title':'Plumb waters plant',...'body':'Professor Plumb has a green plant in his study',... }]>>>idx=lunr(...ref='id',fields=('title','body'),documents=documents... )>>>idx.search('kill')[{'ref':'a','score':0.6931722372559913,'match_data':<MatchData"kill">}]>>>idx.search('study')[{'ref':'b','score':0.23576799568081389,'match_data':<MatchData"studi">}, {'ref':'a','score':0.2236629211724517,'match_data':<MatchData"studi">}]
Please refer to thedocumentationfor more usage examples.
About
A Python implementation of Lunr.js 🌖