Stop words
Describes how to enable and use stop words.
🚧
This feature is deprecated.
Stop words are the words in a stop list (or stop list or negative dictionary) that are filtered out (stopped) before or after processing of natural language data (text) because they are insignificant.
A use-case of stop words, besides stopping unimportant words from being processed, is stopping words that are considerednoise otherwise from a business or societal perspective. This means that no matches are retrieved by the search engine given the queried stop words. Optimizely Graph does not use stop words by default, but you can configure them.
Stop words with full-text search
The following list of English words are often considered stop words:
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with.
A stop word is usually a single word that is used as a filter to stop a token from being indexed.
For example, if you have this field value"the dog is at the park", and use this stop list, then the following tokens get indexed["dog", "park"] and you can only match on these two tokens when doing a full-text search using thecontains orlike operators.
❗️
WarningStop words are supported for searchable string fields but not supported for normal string, number, date, and Boolean fields.
In the latter field types, stop words are not applied, and results are found when querying with stop words. Optimizely Graph supportsonly single-token stop words, and multi-word stop words are not applied.
Store custom stop words
Stop words are stored as a text file, each line being a single stop word.
🚧
ImportantA line in the stop list cannot exceed 1,000 characters or bytes, and the maximum number of entries is 50,000.
Stop words are treated case-sensitively at index and query time. For example,the is different thanThe. This could be useful to fully index and query onThe Guardian (newspaper) but ignorethe inthe guardian with full-text search.
The following is an example of a list of stop words. They are used in the query examples below.
theSchwarzeneggeramyBobYou can store stop words using the REST endpoint configured in the GraphQL gateway. It requires authorization using your HMAC key and secret.
PUT <GATEWAY_URL>/resources/stopwordswith the following optional query string:language_routingto store thecustom stop words in the request body for a specificlocale(default isstandard, that is, no locale)
The body should contain stop words as previously described or can be empty if you do not want to configure any stop words (the default behavior). If you do not use a query parameter with this endpoint, then the custom stop list is applied to theNEUTRAL locale (index with no languages configured).
After storing stop words, they are automatically applied when synchronizing content and ignored when querying with Optimizely Graph.
❗️
WarningYou must store your stop words in Optimizely Graph before provisioning your account and synchronizing content. You cannot update stop lists after your account is provisioned. If you want to update your stop list, you must do the following:
- Delete account
- Upload the updated stop list with the PUT endpoint as described above.
- Create account.
- Synchronize content.
Query examples
For full-text search with thecontains andlike operators on searchable string fields, Optimizelyonly permits single-token stop words, and multi-word stop words will not be applied.
WhenSchwarzenegger is a stop word and occurs asSchwarzenegger (case-sensitive) in your content, the following query will not return any results.
{ BiographyPage(where: { Name: { contains: "Schwarzenegger" } }) { items { Name Die Born Language { DisplayName Name } _score } }}However, if the nameAmy Winehouse occurs in your content butamy (note the lowercase) is defined as a stop word, you still get a result returned with the following GraphQL query because the termAmy (note the uppercase) was never stopped from being indexed and will return the result.
{ BiographyPage(where: { Name: { contains: "Amy" } }) { items { Name Die Born Language { DisplayName Name } _score } }}This query is equivalent in this form and will also return the result.
{ BiographyPage(where: { Name: { like: "%Amy%" } }) { items { Name Die Born Language { DisplayName Name } _score } }}Both examples will return this result:
{ "data": { "BiographyPage": { "items": [ { "Name": "Amy Winehouse", "Die": "2011-07-23T00:00:00Z", "Born": "1983-11-14T00:00:00Z", "Language": { "DisplayName": "English", "Name": "en" }, "_score": 1.6928279 } ] } }}Stop words are processed case-sensitively at indexing time. Therefore, the following query will not return any results because it is a stop word.
{ BiographyPage(where: { Name: { contains: "amy" } }) { items { Name Die Born Language { DisplayName Name } _score } }}Updated 2 months ago
