Searchindex Configuration
Introduction
TheSearchIndex element is part of theWorkspace configuration. Also see the Jackrabbit Wiki about Search and Indexing Configuration.
This page elaborates on "best practice" SearchIndex params.
A minimalSearchIndex configuration looks like the following:
<SearchIndex> <param name="indexingConfiguration" value="indexing_configuration.xml"/> <param name="indexingConfigurationClass" value="org.hippoecm.repository.query.lucene.ServicingIndexingConfigurationImpl"/></SearchIndex>
Best practice
The Jackrabbit documentation describes all available <SearchIndex> params. Below are some Hippo Repository default settings for a combination of params which have been found to be proper values.
<param name="useCompoundFile" value="true"/><param name="minMergeDocs" value="1000"/><param name="volatileIdleTime" value="10"/><param name="maxMergeDocs" value="1000000000"/><param name="mergeFactor" value="5"/>
SettingmaxMergeDocs too low ormergeFactor too high results in many lucene indexes which in turn slows down lucene queries severely. ThevolatileIdleTime is the idle time in seconds until the volatile index part is moved to a persistent index even thoughminMergeDocs is not reached.
Another interestingSearchIndex param is:
<param name="analyzer" value="org.hippoecm.repository.query.lucene.StandardHippoAnalyzer"/>
analyzer: Default, Hippo ships withorg.hippoecm.repository.query.lucene.StandardHippoAnalyzer which is used as default text analyzer. If needed, this analyzer can be replaced by for exampleorg.apache.lucene.analysis.Analyzer.GermanAnalyzer for German texts.
The parametersforceConsistencyCheck,enableConsistencyCheck andautoRepair are all set to true by default. SeeChecking and fixing search index inconsistencies for more information about this.
<param name="forceConsistencyCheck" value="true"/><param name="enableConsistencyCheck" value="true"/><param name="autoRepair" value="true"/>
When highlighting is not used in search results, which is default the case, it is best to not support highlighting at all, as it reduces the Lucene index sizes. This is done through
<param name="supportHighlighting" value="false"/>
Similarity on text between documents is supported by default, and similarity on text between binaries is by default switched off. This is set through
<param name="supportSimilarityOnStrings" value="true"/><param name="supportSimilarityOnBinaries" value="false"/>
Since Hippo Repository supports authorized queries, it can show exact query result sizes very fast since the authorization is checked against Lucene. There are however authorization setups possible, where not all checks can be done against Lucene. In that case, the#getSize of theQueryResult might be bigger than the actual results that can be retrieved (note that an unauthorized node is never returned as standard authorization checks when fetching the node for the query result are still done). If you have an authorization setup that results in#getSize not being precise, and you prefer correctness of the size above performance, you can change the default below totrue.
<param name="slowAlwaysExactSizedQueryResult" value="false"/>