Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Pysolr — Python Solr client

License

NotificationsYou must be signed in to change notification settings

django-haystack/pysolr

Repository files navigation

pysolr is a lightweight Python client forApache Solr. It provides aninterface that queries the server and returns results based on the query.

Status

Changelog

Features

  • Basic operations such as selecting, updating & deleting.
  • Index optimization.
  • "More Like This" support (if set up in Solr).
  • Spelling correction (if set up in Solr).
  • Timeout support.
  • SolrCloud awareness

Requirements

  • Python 2.7 - 3.7
  • Requests 2.9.1+
  • Optional -simplejson
  • Optional -kazoo for SolrCloud mode

Installation

pysolr is on PyPI:

$pip install pysolr

Or if you want to install directly from the repository:

$python setup.py install

Usage

Basic usage looks like:

# If on Python 2.Xfrom __future__importprint_functionimportpysolr# Create a client instance. The timeout and authentication options are not required.solr=pysolr.Solr('http://localhost:8983/solr/',always_commit=True, [timeout=10], [auth=<typeofauthentication>])# Note that auto_commit defaults to False for performance. You can set# `auto_commit=True` to have commands always update the index immediately, make# an update call with `commit=True`, or use Solr's `autoCommit` / `commitWithin`# to have your data be committed following a particular policy.# Do a health check.solr.ping()# How you'd index data.solr.add([    {"id":"doc_1","title":"A test document",    },    {"id":"doc_2","title":"The Banana: Tasty or Dangerous?","_doc": [            {"id":"child_doc_1","title":"peel" },            {"id":"child_doc_2","title":"seed" },        ]    },])# You can index a parent/child document relationship by# associating a list of child documents with the special key '_doc'. This# is helpful for queries that join together conditions on children and parent# documents.# Later, searching is easy. In the simple case, just a plain Lucene-style# query is fine.results=solr.search('bananas')# The ``Results`` object stores total results found, by default the top# ten most relevant results and any additional data like# facets/highlighting/spelling/etc.print("Saw {0} result(s).".format(len(results)))# Just loop over it to access the results.forresultinresults:print("The title is '{0}'.".format(result['title']))# For a more advanced query, say involving highlighting, you can pass# additional options to Solr.results=solr.search('bananas',**{'hl':'true','hl.fragsize':10,})# Traverse a cursor using its iterator:fordocinsolr.search('*:*',fl='id',sort='id ASC',cursorMark='*'):print(doc['id'])# You can also perform More Like This searches, if your Solr is configured# correctly.similar=solr.more_like_this(q='id:doc_2',mltfl='text')# Finally, you can delete either individual documents,solr.delete(id='doc_1')# also in batches...solr.delete(id=['doc_1','doc_2'])# ...or all documents.solr.delete(q='*:*')
# For SolrCloud mode, initialize your Solr like this:zookeeper=pysolr.ZooKeeper("zkhost1:2181,zkhost2:2181,zkhost3:2181")solr=pysolr.SolrCloud(zookeeper,"collection1",auth=<typeofauthentication>)

Multicore Index

Simply point the URL to the index core:

# Setup a Solr instance. The timeout is optional.solr=pysolr.Solr('http://localhost:8983/solr/core_0/',timeout=10)

Custom Request Handlers

# Setup a Solr instance. The trailing slash is optional.solr=pysolr.Solr('http://localhost:8983/solr/core_0/',search_handler='/autocomplete',use_qt_param=False)

Ifuse_qt_param isTrue it is essential that the name of the handler isexactly what is configured insolrconfig.xml, including the leading slashif any. Ifuse_qt_param isFalse (default), the leading and trailingslashes can be omitted.

Ifsearch_handler is not specified, pysolr will default to/select.

The handlers for MoreLikeThis, Update, Terms etc. all default to the values setin thesolrconfig.xml SOLR ships with:mlt,update,terms etc.The specific methods of pysolr'sSolr class (likemore_like_this,suggest_terms etc.) allow for a kwarghandler to override that value.This includes thesearch method. Setting a handler insearch explicitlyoverrides thesearch_handler setting (if any).

Custom Authentication

# Setup a Solr instance in a kerborized environmentfromrequests_kerberosimportHTTPKerberosAuth,OPTIONALkerberos_auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL,sanitize_mutual_error_response=False)solr=pysolr.Solr('http://localhost:8983/solr/',auth=kerberos_auth)
# Setup a CloudSolr instance in a kerborized environmentfromrequests_kerberosimportHTTPKerberosAuth,OPTIONALkerberos_auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL,sanitize_mutual_error_response=False)zookeeper=pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")solr=pysolr.SolrCloud(zookeeper,"collection",auth=kerberos_auth)

If your Solr servers run off https

# Setup a Solr instance in an https environmentsolr=pysolr.Solr('http://localhost:8983/solr/',verify=path/to/cert.pem)
# Setup a CloudSolr instance in a kerborized environmentzookeeper=pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")solr=pysolr.SolrCloud(zookeeper,"collection",verify=path/to/cert.perm)

Custom Commit Policy

# Setup a Solr instance. The trailing slash is optional.# All requests to Solr will be immediately committed because `always_commit=True`:solr=pysolr.Solr('http://localhost:8983/solr/core_0/',search_handler='/autocomplete',always_commit=True)

always_commit signals to the Solr object to either commit or not commit bydefault for any solr request. Be sure to change this toTrue if you areupgrading from a version where the default policy was always commit by default.

Functions likeadd anddelete also still provide a way to override thedefault by passing thecommit kwarg.

It is generally good practice to limit the amount of commits to Solr asexcessive commits risk opening too many searchers or excessive systemresource consumption. See the Solr documentation for more information anddetails about theautoCommit andcommitWithin options:

https://lucene.apache.org/solr/guide/7_7/updatehandlers-in-solrconfig.html#UpdateHandlersinSolrConfig-autoCommit

LICENSE

pysolr is licensed under the New BSD license.

Contributing to pysolr

For consistency, this project usespre-commit to manage Git commit hooks:

  1. Install the pre-commit package: e.g. brew install pre-commit,pip install pre-commit, etc.
  2. Run pre-commit install each time you check out a new copy of this Gitrepository to ensure that every subsequent commit will be processed byrunning pre-commit run, which you may also do as desired. To test theentire repository or in a CI scenario, you can check every file rather thanjust the staged ones using pre-commit run --all.

Running Tests

Therun-tests.py script will automatically perform the steps below and isrecommended for testing by default unless you need more control.

Running a test Solr instance

Downloading, configuring and running Solr 4 looks like this:

./start-solr-test-server.sh

Running the tests

$python -m unittest tests

[8]ページ先頭

©2009-2025 Movatter.jp