PostgreSQL 9.4.1 Documentation
Prev	Up	Appendix F. Additional Supplied Modules	Next

F.43. unaccent

unaccent is a text search dictionary that removes accents (diacritic signs) from lexemes. It's a filtering dictionary, which means its output is always passed to the next dictionary (if any), unlike the normal behavior of dictionaries. This allows accent-insensitive processing for full text search.

The current implementation ofunaccent cannot be used as a normalizing dictionary for thethesaurus dictionary.

F.43.1. Configuration

Anunaccent dictionary accepts the following options:

RULES is the base name of the file containing the list of translation rules. This file must be stored in$SHAREDIR/tsearch_data/ (where$SHAREDIR means thePostgreSQL installation's shared-data directory). Its name must end in.rules (which is not to be included in theRULES parameter).

The rules file has the following format:

Each line represents a pair, consisting of a character with accent followed by a character without accent. The first is translated into the second. For example,
```
À        AÁ        AÂ        AÃ        AÄ        AÅ        AÆ        A
```

A more complete example, which is directly useful for most European languages, can be found inunaccent.rules, which is installed in$SHAREDIR/tsearch_data/ when theunaccent module is installed.

F.43.2. Usage

Installing theunaccent extension creates a text search templateunaccent and a dictionaryunaccent based on it. Theunaccent dictionary has the default parameter settingRULES='unaccent', which makes it immediately usable with the standardunaccent.rules file. If you wish, you can alter the parameter, for example

mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');

or create new dictionaries based on the template.

To test the dictionary, you can try:

mydb=# select ts_lexize('unaccent','Hôtel'); ts_lexize----------- {Hotel}(1 row)

Here is an example showing how to insert theunaccent dictionary into a text search configuration:

mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );mydb=# ALTER TEXT SEARCH CONFIGURATION fr        ALTER MAPPING FOR hword, hword_part, word        WITH unaccent, french_stem;mydb=# select to_tsvector('fr','Hôtels de la Mer');    to_tsvector------------------- 'hotel':1 'mer':4(1 row)mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); ?column?---------- t(1 row)mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));      ts_headline------------------------ <b>Hôtel</b> de la Mer(1 row)

F.43.3. Functions

Theunaccent() function removes accents (diacritic signs) from a given string. Basically, it's a wrapper around theunaccent dictionary, but it can be used outside normal text search contexts.

unaccent([dictionary,]string) returnstext

For example:

SELECT unaccent('unaccent', 'Hôtel');SELECT unaccent('Hôtel');

Prev	Home	Next
tsearch2	Up	uuid-ossp

Movatterモバイル変換

F.43. unaccent

F.43.1. Configuration

F.43.2. Usage

F.43.3. Functions

Есть вопросы? Напишите нам!