Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Patch for PostgreSQL and dump for testing

NotificationsYou must be signed in to change notification settings

za-arthur/pg_multilingual

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sometimes there is a need in text search using different dictionaries in sametext. In this case users combine necessary dictionaries into one dictionaryfile. Usually it is hard task.

This patch allow to use different text search dictionaries in one text using onetext search configuration.

Installation

This patch can be applyed tomaster PostgreSQL:

$ git clone git@github.com:select-artur/pg_multilingual.git$ git clone git@github.com:select-artur/postgres.git$ cd postgres$ git apply ../pg_multilingual/pg_multilingual_join.patch$ ./configure$ make$ make install

Changes

New option for text search configuration mapping:JOIN

In PostgreSQL's text search configuration for each token type you can specify alist of dictionaries. PostgreSQL uses each dictionary to normalize words intolexems using loop. Original PostgreSQL stops word normalizing if some dicitonaryrecognizes it as a known word.

With optionJOIN PostgreSQL return all recognized lexems until the end of alist of dictionaries or a configuration mapping without this option are occured.It is usefull if you don't know in what language a document is wrote.

Usage

Let's suppose you already havegerman_hunspell andenglish_hunspelldictionaries. You need to execute the following query to create multilingualconfiguration:

=> CREATETEXT SEARCH CONFIGURATION multi_conf (COPY=simple);=> ALTERTEXT SEARCH CONFIGURATION multi_confALTER MAPPING FOR asciiword, asciihword, hword_asciipart,word, hword, hword_partWITH german_hunspell (JOIN), english_hunspell;

After this you can query documents with german and english words.

Example

apod_en_de.dump is the dump of test base. To restore this dump you needgerman and english hunspell dictionaries fromhunspell_dicts:

$ git clone git@github.com:postgrespro/hunspell_dicts.git$ cd hunspell_dicts$ make -C hunspell_en_us USE_PGXS=1 install$ make -C hunspell_de_de USE_PGXS=1 install

After this dictionary files were copied to the PostgreSQL share directory.

You need to restore the dump:

$ psql apod < pg_multilingual/apod_en_de.dump

This command will create the following objects:

  • hunspell_en_us andhunspell_de_de extensions
  • english_hunspell andgerman_hunspell dictionaries
  • apod_conf configuration
  • apod table with english and german documents

Here example queries:

=>SELECT titleFROM apodWHERE fts @@ to_tsquery('apod_conf','galaxy')LIMIT1;        title--------------------- The UV SMCfrom UIT(1 row)=>SELECT titleFROM apodWHERE fts @@ to_tsquery('apod_conf','Galaxie')LIMIT1;             title-------------------------------- Andromeda- ein Inseluniversum(1 row)

About

Patch for PostgreSQL and dump for testing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp