- Notifications
You must be signed in to change notification settings - Fork342
Pysolr — Python Solr client
License
django-haystack/pysolr
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
pysolr is a lightweight Python client forApache Solr. It provides aninterface that queries the server and returns results based on the query.
- Basic operations such as selecting, updating & deleting.
- Index optimization.
- "More Like This" support (if set up in Solr).
- Spelling correction (if set up in Solr).
- Timeout support.
- SolrCloud awareness
- A supported version of Python 3
- Requests 2.9.1+
- Optional -
simplejson - Optional -
kazoofor SolrCloud mode
pysolr is on PyPI:
$pip install pysolrOr if you want to install directly from the repository:
$python setup.py installBasic usage looks like:
# If on Python 2.Xfrom __future__importprint_functionimportpysolr# Create a client instance. The timeout and authentication options are not required.solr=pysolr.Solr('http://localhost:8983/solr/',always_commit=True, [timeout=10], [auth=<typeofauthentication>])# Note that auto_commit defaults to False for performance. You can set# `auto_commit=True` to have commands always update the index immediately, make# an update call with `commit=True`, or use Solr's `autoCommit` / `commitWithin`# to have your data be committed following a particular policy.# Do a health check.solr.ping()# How you'd index data.solr.add([ {"id":"doc_1","title":"A test document", }, {"id":"doc_2","title":"The Banana: Tasty or Dangerous?","_doc": [ {"id":"child_doc_1","title":"peel" }, {"id":"child_doc_2","title":"seed" }, ] },])# You can index a parent/child document relationship by# associating a list of child documents with the special key '_doc'. This# is helpful for queries that join together conditions on children and parent# documents.# Later, searching is easy. In the simple case, just a plain Lucene-style# query is fine.results=solr.search('bananas')# The ``Results`` object stores total results found, by default the top# ten most relevant results and any additional data like# facets/highlighting/spelling/etc.print("Saw {0} result(s).".format(len(results)))# Just loop over it to access the results.forresultinresults:print("The title is '{0}'.".format(result['title']))# For a more advanced query, say involving highlighting, you can pass# additional options to Solr.results=solr.search('bananas',**{'hl':'true','hl.fragsize':10,})# Traverse a cursor using its iterator:fordocinsolr.search('*:*',fl='id',sort='id ASC',cursorMark='*'):print(doc['id'])# You can also perform More Like This searches, if your Solr is configured# correctly.similar=solr.more_like_this(q='id:doc_2',mltfl='text')# Finally, you can delete either individual documents,solr.delete(id='doc_1')# also in batches...solr.delete(id=['doc_1','doc_2'])# ...or all documents.solr.delete(q='*:*')
# For SolrCloud mode, initialize your Solr like this:zookeeper=pysolr.ZooKeeper("zkhost1:2181,zkhost2:2181,zkhost3:2181")solr=pysolr.SolrCloud(zookeeper,"collection1",auth=<typeofauthentication>)
Simply point the URL to the index core:
# Setup a Solr instance. The timeout is optional.solr=pysolr.Solr('http://localhost:8983/solr/core_0/',timeout=10)
# Setup a Solr instance. The trailing slash is optional.solr=pysolr.Solr('http://localhost:8983/solr/core_0/',search_handler='/autocomplete',use_qt_param=False)
Ifuse_qt_param isTrue it is essential that the name of the handler isexactly what is configured insolrconfig.xml, including the leading slashif any. Ifuse_qt_param isFalse (default), the leading and trailingslashes can be omitted.
Ifsearch_handler is not specified, pysolr will default to/select.
The handlers for MoreLikeThis, Update, Terms etc. all default to the values setin thesolrconfig.xml SOLR ships with:mlt,update,terms etc.The specific methods of pysolr'sSolr class (likemore_like_this,suggest_terms etc.) allow for a kwarghandler to override that value.This includes thesearch method. Setting a handler insearch explicitlyoverrides thesearch_handler setting (if any).
# Setup a Solr instance in a kerborized environmentfromrequests_kerberosimportHTTPKerberosAuth,OPTIONALkerberos_auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL,sanitize_mutual_error_response=False)solr=pysolr.Solr('http://localhost:8983/solr/',auth=kerberos_auth)
# Setup a CloudSolr instance in a kerborized environmentfromrequests_kerberosimportHTTPKerberosAuth,OPTIONALkerberos_auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL,sanitize_mutual_error_response=False)zookeeper=pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")solr=pysolr.SolrCloud(zookeeper,"collection",auth=kerberos_auth)
# Setup a Solr instance in an https environmentsolr=pysolr.Solr('http://localhost:8983/solr/',verify=path/to/cert.pem)
# Setup a CloudSolr instance in a kerborized environmentzookeeper=pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")solr=pysolr.SolrCloud(zookeeper,"collection",verify=path/to/cert.perm)
# Setup a Solr instance. The trailing slash is optional.# All requests to Solr will be immediately committed because `always_commit=True`:solr=pysolr.Solr('http://localhost:8983/solr/core_0/',search_handler='/autocomplete',always_commit=True)
always_commit signals to the Solr object to either commit or not commit bydefault for any solr request. Be sure to change this toTrue if you areupgrading from a version where the default policy was always commit by default.
Functions likeadd anddelete also still provide a way to override thedefault by passing thecommit kwarg.
It is generally good practice to limit the amount of commits to Solr asexcessive commits risk opening too many searchers or excessive systemresource consumption. See the Solr documentation for more information anddetails about theautoCommit andcommitWithin options:
pysolr is licensed under the New BSD license.
For consistency, this project usespre-commit to manage Git commit hooks:
- Install the pre-commit package: e.g. brew install pre-commit,pip install pre-commit, etc.
- Run pre-commit install each time you check out a new copy of this Gitrepository to ensure that every subsequent commit will be processed byrunning pre-commit run, which you may also do as desired. To test theentire repository or in a CI scenario, you can check every file rather thanjust the staged ones using pre-commit run --all.
Therun-tests.py script will automatically perform the steps below and isrecommended for testing by default unless you need more control.
Downloading, configuring and running Solr 4 looks like this:
./start-solr-test-server.sh
$python -m unittest testsAbout
Pysolr — Python Solr client
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.