ICU normalization token filter

Normalizes characters as explained here. It registers itself as theicu_normalizer token filter, which is available to all indices without any further configuration. The type of normalization can be specified with thename parameter, which acceptsnfc,nfkc, andnfkc_cf (default).

Which letters are normalized can be controlled by specifying theunicode_set_filter parameter, which accepts aUnicodeSet.

You should probably prefer theNormalization character filter.

Here are two examples, the default usage and a customised token filter:

PUT icu_sample{  "settings": {    "index": {      "analysis": {        "analyzer": {          "nfkc_cf_normalized": {            "tokenizer": "icu_tokenizer",            "filter": [              "icu_normalizer"            ]          },          "nfc_normalized": {            "tokenizer": "icu_tokenizer",            "filter": [              "nfc_normalizer"            ]          }        },        "filter": {          "nfc_normalizer": {            "type": "icu_normalizer",            "name": "nfc"          }        }      }    }  }}

Uses the defaultnfkc_cf normalization.
Uses the customizednfc_normalizer token filter, which is set to usenfc normalization.

Movatterモバイル変換

ICU normalization token filter