- Notifications
You must be signed in to change notification settings - Fork341
Pysolr — Python Solr client
License
django-haystack/pysolr
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
pysolr
is a lightweight Python client forApache Solr. It provides aninterface that queries the server and returns results based on the query.
- Basic operations such as selecting, updating & deleting.
- Index optimization.
- "More Like This" support (if set up in Solr).
- Spelling correction (if set up in Solr).
- Timeout support.
- SolrCloud awareness
- Python 2.7 - 3.7
- Requests 2.9.1+
- Optional -
simplejson
- Optional -
kazoo
for SolrCloud mode
pysolr is on PyPI:
$pip install pysolr
Or if you want to install directly from the repository:
$python setup.py install
Basic usage looks like:
# If on Python 2.Xfrom __future__importprint_functionimportpysolr# Create a client instance. The timeout and authentication options are not required.solr=pysolr.Solr('http://localhost:8983/solr/',always_commit=True, [timeout=10], [auth=<typeofauthentication>])# Note that auto_commit defaults to False for performance. You can set# `auto_commit=True` to have commands always update the index immediately, make# an update call with `commit=True`, or use Solr's `autoCommit` / `commitWithin`# to have your data be committed following a particular policy.# Do a health check.solr.ping()# How you'd index data.solr.add([ {"id":"doc_1","title":"A test document", }, {"id":"doc_2","title":"The Banana: Tasty or Dangerous?","_doc": [ {"id":"child_doc_1","title":"peel" }, {"id":"child_doc_2","title":"seed" }, ] },])# You can index a parent/child document relationship by# associating a list of child documents with the special key '_doc'. This# is helpful for queries that join together conditions on children and parent# documents.# Later, searching is easy. In the simple case, just a plain Lucene-style# query is fine.results=solr.search('bananas')# The ``Results`` object stores total results found, by default the top# ten most relevant results and any additional data like# facets/highlighting/spelling/etc.print("Saw {0} result(s).".format(len(results)))# Just loop over it to access the results.forresultinresults:print("The title is '{0}'.".format(result['title']))# For a more advanced query, say involving highlighting, you can pass# additional options to Solr.results=solr.search('bananas',**{'hl':'true','hl.fragsize':10,})# Traverse a cursor using its iterator:fordocinsolr.search('*:*',fl='id',sort='id ASC',cursorMark='*'):print(doc['id'])# You can also perform More Like This searches, if your Solr is configured# correctly.similar=solr.more_like_this(q='id:doc_2',mltfl='text')# Finally, you can delete either individual documents,solr.delete(id='doc_1')# also in batches...solr.delete(id=['doc_1','doc_2'])# ...or all documents.solr.delete(q='*:*')
# For SolrCloud mode, initialize your Solr like this:zookeeper=pysolr.ZooKeeper("zkhost1:2181,zkhost2:2181,zkhost3:2181")solr=pysolr.SolrCloud(zookeeper,"collection1",auth=<typeofauthentication>)
Simply point the URL to the index core:
# Setup a Solr instance. The timeout is optional.solr=pysolr.Solr('http://localhost:8983/solr/core_0/',timeout=10)
# Setup a Solr instance. The trailing slash is optional.solr=pysolr.Solr('http://localhost:8983/solr/core_0/',search_handler='/autocomplete',use_qt_param=False)
Ifuse_qt_param
isTrue
it is essential that the name of the handler isexactly what is configured insolrconfig.xml
, including the leading slashif any. Ifuse_qt_param
isFalse
(default), the leading and trailingslashes can be omitted.
Ifsearch_handler
is not specified, pysolr will default to/select
.
The handlers for MoreLikeThis, Update, Terms etc. all default to the values setin thesolrconfig.xml
SOLR ships with:mlt
,update
,terms
etc.The specific methods of pysolr'sSolr
class (likemore_like_this
,suggest_terms
etc.) allow for a kwarghandler
to override that value.This includes thesearch
method. Setting a handler insearch
explicitlyoverrides thesearch_handler
setting (if any).
# Setup a Solr instance in a kerborized environmentfromrequests_kerberosimportHTTPKerberosAuth,OPTIONALkerberos_auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL,sanitize_mutual_error_response=False)solr=pysolr.Solr('http://localhost:8983/solr/',auth=kerberos_auth)
# Setup a CloudSolr instance in a kerborized environmentfromrequests_kerberosimportHTTPKerberosAuth,OPTIONALkerberos_auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL,sanitize_mutual_error_response=False)zookeeper=pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")solr=pysolr.SolrCloud(zookeeper,"collection",auth=kerberos_auth)
# Setup a Solr instance in an https environmentsolr=pysolr.Solr('http://localhost:8983/solr/',verify=path/to/cert.pem)
# Setup a CloudSolr instance in a kerborized environmentzookeeper=pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")solr=pysolr.SolrCloud(zookeeper,"collection",verify=path/to/cert.perm)
# Setup a Solr instance. The trailing slash is optional.# All requests to Solr will be immediately committed because `always_commit=True`:solr=pysolr.Solr('http://localhost:8983/solr/core_0/',search_handler='/autocomplete',always_commit=True)
always_commit
signals to the Solr object to either commit or not commit bydefault for any solr request. Be sure to change this toTrue
if you areupgrading from a version where the default policy was always commit by default.
Functions likeadd
anddelete
also still provide a way to override thedefault by passing thecommit
kwarg.
It is generally good practice to limit the amount of commits to Solr asexcessive commits risk opening too many searchers or excessive systemresource consumption. See the Solr documentation for more information anddetails about theautoCommit
andcommitWithin
options:
pysolr
is licensed under the New BSD license.
For consistency, this project usespre-commit to manage Git commit hooks:
- Install the pre-commit package: e.g. brew install pre-commit,pip install pre-commit, etc.
- Run pre-commit install each time you check out a new copy of this Gitrepository to ensure that every subsequent commit will be processed byrunning pre-commit run, which you may also do as desired. To test theentire repository or in a CI scenario, you can check every file rather thanjust the staged ones using pre-commit run --all.
Therun-tests.py
script will automatically perform the steps below and isrecommended for testing by default unless you need more control.
Downloading, configuring and running Solr 4 looks like this:
./start-solr-test-server.sh
$python -m unittest tests
About
Pysolr — Python Solr client