Developer Docs /Backend Development /Content Repository /Configuration /Lucene Analyzer

Configure the Lucene Analyzer

Introduction

Hippo Repository uses org.hippoecm.repository.query.lucene.StandardHippoAnalyzer as default Lucene Analyzer for the stored content. This analyzer strips stopwords for the languagesEnglish, German, Dutch, French, Spanish and Brazilian. It also applies aISO Latin 1 accent filter, this replaces a letter likeç with c andï with i, etc

Customizing Stop Words of StandardHippoAnalyzer

This feature is available since Bloomreach Experience Manager 12.0.

Stop words oforg.hippoecm.repository.query.lucene.StandardHippoAnalyzer are stored in the following classpath resource files for each different language, and so it is possible to customize those stop words byshadowing those resource files in the classpath if needed:

Default:classpath:org/hippoecm/repository/query/lucene/StandardHippoAnalyzer.properties
English:classpath:org/hippoecm/repository/query/lucene/StandardHippoAnalyzer_en.properties
Spanish:classpath:org/hippoecm/repository/query/lucene/StandardHippoAnalyzer_es.properties
French:classpath:org/hippoecm/repository/query/lucene/StandardHippoAnalyzer_fr.properties
German:classpath:org/hippoecm/repository/query/lucene/StandardHippoAnalyzer_de.properties
Dutch:classpath:org/hippoecm/repository/query/lucene/StandardHippoAnalyzer_nl.properties
Brazilian Portuguese:classpath:org/hippoecm/repository/query/lucene/StandardHippoAnalyzer_pt_BR.properties
Czech:classpath:org/hippoecm/repository/query/lucene/StandardHippoAnalyzer_cs.properties

For example, stop words for English language looks like the following:

# The delimiters to use when splitting stopwords.split.tokens value.stopwords.split.delimiters=,# Whether or not to preserve all the tokens including empty string token.stopwords.split.preserveAllTokens=true# Stopwords tokens.stopwords.split.tokens=a,and,are,as,at,be,but,by,for,if,in,into,is,it,no,not,of,on,or,s,such,t,that,the,their,then,there,these,they,this,to,was,will,with,,www

Just as an example, if you want to add more stop words like "etc" or "ie", then you can add those two words, delimited by a comma, tostopwords.split.tokens property. You can addcms/src/main/resources/org/hippoecm/repository/query/lucene/StandardHippoAnalyzer_en.properties with your custom change for instance ifcms/ is the only submodule containing the repository instance.

Custom Lucene Analyzer

You can configure custom language analyzers, that for example also add stemming. The side effect is that it breaks wildcard searching. Explaining this is beyond the scope of this page, as it involves general concepts about inverted indexes, such as Lucene. We advice to stick to theStandardHippoAnalyzer if you want to avoid wildcard searching issues.

Modify the Analyzer class

The Analyzer class is configured in therepository.xml file.

Change the value of

<param name="analyzer" value="org.hippoecm.repository.query.lucene.StandardHippoAnalyzer"/>

to the classname of your analyzer.

See Repository deployment settings for how to use your customizedrepository.xml.

Did you find this page helpful?

How could this documentation serve you better?

Cheers!

Did you find this page helpful?

How could this documentation serve you better?

Cheers!

Movatterモバイル変換

Content Application

Channels

Projects

Relevance

Architecture

Concepts

Platform Configuration

Frontend Integration

Backend Development

Commerce Accelerator

Cloud Deployment (PaaS)

On-Premise Deployment

Security

Release Management

Platform Development

Bloomreach Documentation version

Configure the Lucene Analyzer

Introduction

Customizing Stop Words of StandardHippoAnalyzer

Custom Lucene Analyzer

Modify the Analyzer class

Did you find this page helpful?

On this page

Did you find this page helpful?