postgrespro/shared_ispellPublic

forked fromtvondra/shared_ispell

NotificationsYou must be signed in to change notification settings
Fork1
Star12

Shared ispell dictionary (stored in shared segment, used by multiple connections)

License

View license

12 stars 4 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
expected		expected
sql		sql
src		src
.gitignore		.gitignore
LICENSE		LICENSE
META.json		META.json
Makefile		Makefile
README.md		README.md
meson.build		meson.build
postgresql.conf		postgresql.conf
shared_ispell--1.1.0.sql		shared_ispell--1.1.0.sql
shared_ispell.control		shared_ispell.control

Repository files navigation

Shared ISpell Dictionary

This PostgreSQL extension provides a shared ispell dictionary, i.e.a dictionary that's stored in shared segment. The traditional ispellimplementation means that each session initializes and stores thedictionary on it's own, which means a lot of CPU/RAM is wasted.

This extension allocates an area in shared segment (you have tochoose the size in advance) and then loads the dictionary into itwhen it's used for the first time.

If you need just snowball-type dictionaries, this extension is notreally interesting for you. But if you really need an ispelldictionary, this may save you a lot of resources.

Install

Before build and installshared_ispell you should ensure following:

PostgreSQL version is 9.6 or later.

Installing the extension is quite simple. In that case all you need to do is this:

$ git clone git@github.com:postgrespro/shared_ispell.git$ cd shared_ispell$ make USE_PGXS=1$ make USE_PGXS=1 install

and then (after connecting to the database)

db=# CREATE EXTENSION shared_ispell;

Important: Don't forget to set thePG_CONFIG variable in case you want to testshared_ispell on a custom build of PostgreSQL. Read morehere.

Config

No the functions are created, but you still need to load the sharedmodule. This needs to be done from postgresql.conf, as the moduleneeds to allocate space in the shared memory segment. So add this tothe config file (or update the current values)

# libraries to loadshared_preload_libraries = 'shared_ispell'# config of the shared memoryshared_ispell.max_size = 32MB

Yes, there's a single GUC variable that defines the maximum size ofthe shared segment. This is a hard limit, the shared segment is notextensible and you need to set it so that all the dictionaries fitinto it and not much memory is wasted.

To find out how much memory you actually need, use a large value(e.g. 200MB) and load all the dictionaries you want to use. Then usethe shared_ispell_mem_used() function to find out how much memorywas actually used (and set the max_size GUC variable accordingly).

Don't set it exactly to that value, leave there some free space,so that you can reload the dictionaries without changing the GUCmax_size limit (which requires a restart of the DB). Ssomethinglike 512kB should be just fine.

The shared segment can contain several dictionaries at the same time,the amount of memory is the only limit. There's no limit on numberof dictionaries / words etc. Just the max_size GUC variable.

Using the dictionary

Technically, the extension defines a 'shared_ispell' template thatyou may use to define custom dictionaries. E.g. you may do this

CREATE TEXT SEARCH DICTIONARY czech_shared (    TEMPLATE = shared_ispell,    DictFile = czech,    AffFile = czech,    StopWords = czech);CREATE TEXT SEARCH CONFIGURATION public.czech_shared    ( COPY = pg_catalog.simple );ALTER TEXT SEARCH CONFIGURATION czech_shared    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,                    word, hword, hword_part    WITH czech_shared;

and then do the usual stuff, e.g.

db=# SELECT ts_lexize('czech_shared', 'automobile');

or whatever you want.

Available functions

The extension provides five management functions, that allow you tomanage and get info about the preloaded dictionaries. The first twofunctions

shared_ispell_mem_used()shared_ispell_mem_available()

allow you to get info about the shared segment (used and free memory)e.g. to properly size the segment (max_size). Then there are functionsreturn list of dictionaries / stop lists loaded in the shared segment

shared_ispell_dicts()shared_ispell_stoplists()

e.g. like this

db=# SELECT * FROM shared_ispell_dicts(); dict_name | affix_name | words | affixes |  bytes   -----------+------------+-------+---------+---------- bulgarian | bulgarian  | 79267 |      12 |  7622128 czech     | czech      | 96351 |    2544 | 12715000(2 rows)db=# SELECT * FROM shared_ispell_stoplists(); stop_name | words | bytes -----------+-------+------- czech     |   259 |  4552(1 row)

The last function allows you to reset the dictionary (e.g. so that youcan reload the updated files from disk). The sessions that already usethe dictionaries will be forced to reinitialize them (the first onewill rebuild and copy them in the shared segment, the other ones willuse this prepared data).

db=# SELECT shared_ispell_reset();

That's all for now ...

Changes from original version

The original version of this module located in the Tomas Vondra'sGitHub. That version does not handleaffixes that require full regular expressions (regex_t, implemented in regex.h).

This version of the module can handle that affixes with full regularexressions. To handle it the module loads and stores affix files in eachsessions. The affix list is tiny and takes a little time and memory to parse.Actually this is Tomasidea,but there is not related code in the GitHub.

Author

Tomas VondraGitHub

About

Shared ispell dictionary (stored in shared segment, used by multiple connections)

Releases1

v1.1.0 Latest

Apr 13, 2017

Packages

No packages published

Languages

C91.5%
PLpgSQL3.8%
Meson3.1%
Makefile1.6%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Shared ISpell Dictionary

Install

Config

Using the dictionary

Available functions

Changes from original version

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Languages

Movatterモバイル変換

License

postgrespro/shared_ispell

Folders and files

Latest commit

History

Repository files navigation

Shared ISpell Dictionary

Install

Config

Using the dictionary

Available functions

Changes from original version

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Languages

Packages