Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Shared ispell dictionary (stored in shared segment, used by multiple connections)

License

NotificationsYou must be signed in to change notification settings

postgrespro/shared_ispell

 
 

Repository files navigation

This PostgreSQL extension provides a shared ispell dictionary, i.e.a dictionary that's stored in shared segment. The traditional ispellimplementation means that each session initializes and stores thedictionary on it's own, which means a lot of CPU/RAM is wasted.

This extension allocates an area in shared segment (you have tochoose the size in advance) and then loads the dictionary into itwhen it's used for the first time.

If you need just snowball-type dictionaries, this extension is notreally interesting for you. But if you really need an ispelldictionary, this may save you a lot of resources.

Install

Before build and installshared_ispell you should ensure following:

  • PostgreSQL version is 9.6 or later.

Installing the extension is quite simple. In that case all you need to do is this:

$ git clone git@github.com:postgrespro/shared_ispell.git$ cd shared_ispell$ make USE_PGXS=1$ make USE_PGXS=1 install

and then (after connecting to the database)

db=# CREATE EXTENSION shared_ispell;

Important: Don't forget to set thePG_CONFIG variable in case you want to testshared_ispell on a custom build of PostgreSQL. Read morehere.

Config

No the functions are created, but you still need to load the sharedmodule. This needs to be done from postgresql.conf, as the moduleneeds to allocate space in the shared memory segment. So add this tothe config file (or update the current values)

# libraries to loadshared_preload_libraries = 'shared_ispell'# config of the shared memoryshared_ispell.max_size = 32MB

Yes, there's a single GUC variable that defines the maximum size ofthe shared segment. This is a hard limit, the shared segment is notextensible and you need to set it so that all the dictionaries fitinto it and not much memory is wasted.

To find out how much memory you actually need, use a large value(e.g. 200MB) and load all the dictionaries you want to use. Then usethe shared_ispell_mem_used() function to find out how much memorywas actually used (and set the max_size GUC variable accordingly).

Don't set it exactly to that value, leave there some free space,so that you can reload the dictionaries without changing the GUCmax_size limit (which requires a restart of the DB). Ssomethinglike 512kB should be just fine.

The shared segment can contain several dictionaries at the same time,the amount of memory is the only limit. There's no limit on numberof dictionaries / words etc. Just the max_size GUC variable.

Using the dictionary

Technically, the extension defines a 'shared_ispell' template thatyou may use to define custom dictionaries. E.g. you may do this

CREATE TEXT SEARCH DICTIONARY czech_shared (    TEMPLATE = shared_ispell,    DictFile = czech,    AffFile = czech,    StopWords = czech);CREATE TEXT SEARCH CONFIGURATION public.czech_shared    ( COPY = pg_catalog.simple );ALTER TEXT SEARCH CONFIGURATION czech_shared    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,                    word, hword, hword_part    WITH czech_shared;

and then do the usual stuff, e.g.

db=# SELECT ts_lexize('czech_shared', 'automobile');

or whatever you want.

Available functions

The extension provides five management functions, that allow you tomanage and get info about the preloaded dictionaries. The first twofunctions

shared_ispell_mem_used()shared_ispell_mem_available()

allow you to get info about the shared segment (used and free memory)e.g. to properly size the segment (max_size). Then there are functionsreturn list of dictionaries / stop lists loaded in the shared segment

shared_ispell_dicts()shared_ispell_stoplists()

e.g. like this

db=# SELECT * FROM shared_ispell_dicts(); dict_name | affix_name | words | affixes |  bytes   -----------+------------+-------+---------+---------- bulgarian | bulgarian  | 79267 |      12 |  7622128 czech     | czech      | 96351 |    2544 | 12715000(2 rows)db=# SELECT * FROM shared_ispell_stoplists(); stop_name | words | bytes -----------+-------+------- czech     |   259 |  4552(1 row)

The last function allows you to reset the dictionary (e.g. so that youcan reload the updated files from disk). The sessions that already usethe dictionaries will be forced to reinitialize them (the first onewill rebuild and copy them in the shared segment, the other ones willuse this prepared data).

db=# SELECT shared_ispell_reset();

That's all for now ...

Changes from original version

The original version of this module located in the Tomas Vondra'sGitHub. That version does not handleaffixes that require full regular expressions (regex_t, implemented in regex.h).

This version of the module can handle that affixes with full regularexressions. To handle it the module loads and stores affix files in eachsessions. The affix list is tiny and takes a little time and memory to parse.Actually this is Tomasidea,but there is not related code in the GitHub.

Author

Tomas VondraGitHub

About

Shared ispell dictionary (stored in shared segment, used by multiple connections)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C91.5%
  • PLpgSQL3.8%
  • Meson3.1%
  • Makefile1.6%

[8]ページ先頭

©2009-2025 Movatter.jp