- Notifications
You must be signed in to change notification settings - Fork2
za-arthur/pg_multilingual
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Sometimes there is a need in text search using different dictionaries in sametext. In this case users combine necessary dictionaries into one dictionaryfile. Usually it is hard task.
This patch allow to use different text search dictionaries in one text using onetext search configuration.
This patch can be applyed tomaster PostgreSQL:
$ git clone git@github.com:select-artur/pg_multilingual.git$ git clone git@github.com:select-artur/postgres.git$ cd postgres$ git apply ../pg_multilingual/pg_multilingual_join.patch$ ./configure$ make$ make install
In PostgreSQL's text search configuration for each token type you can specify alist of dictionaries. PostgreSQL uses each dictionary to normalize words intolexems using loop. Original PostgreSQL stops word normalizing if some dicitonaryrecognizes it as a known word.
With optionJOIN PostgreSQL return all recognized lexems until the end of alist of dictionaries or a configuration mapping without this option are occured.It is usefull if you don't know in what language a document is wrote.
Let's suppose you already havegerman_hunspell andenglish_hunspelldictionaries. You need to execute the following query to create multilingualconfiguration:
=> CREATETEXT SEARCH CONFIGURATION multi_conf (COPY=simple);=> ALTERTEXT SEARCH CONFIGURATION multi_confALTER MAPPING FOR asciiword, asciihword, hword_asciipart,word, hword, hword_partWITH german_hunspell (JOIN), english_hunspell;
After this you can query documents with german and english words.
apod_en_de.dump is the dump of test base. To restore this dump you needgerman and english hunspell dictionaries fromhunspell_dicts:
$ git clone git@github.com:postgrespro/hunspell_dicts.git$ cd hunspell_dicts$ make -C hunspell_en_us USE_PGXS=1 install$ make -C hunspell_de_de USE_PGXS=1 install
After this dictionary files were copied to the PostgreSQL share directory.
You need to restore the dump:
$ psql apod < pg_multilingual/apod_en_de.dump
This command will create the following objects:
- hunspell_en_us andhunspell_de_de extensions
- english_hunspell andgerman_hunspell dictionaries
- apod_conf configuration
- apod table with english and german documents
Here example queries:
=>SELECT titleFROM apodWHERE fts @@ to_tsquery('apod_conf','galaxy')LIMIT1; title--------------------- The UV SMCfrom UIT(1 row)=>SELECT titleFROM apodWHERE fts @@ to_tsquery('apod_conf','Galaxie')LIMIT1; title-------------------------------- Andromeda- ein Inseluniversum(1 row)
About
Patch for PostgreSQL and dump for testing
Resources
Uh oh!
There was an error while loading.Please reload this page.