Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit3b9494b

Browse files
author
Artur Zakirov
committed
shared_ispell module added
1 parentb82f06c commit3b9494b

File tree

10 files changed

+1557
-0
lines changed

10 files changed

+1557
-0
lines changed

‎contrib/shared_ispell/LICENSE

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
Copyright 2012, Tomas Vondra (tv@fuzzy.cz). All rights reserved.
2+
3+
Redistribution and use in source and binary forms, with or without modification, are
4+
permitted provided that the following conditions are met:
5+
6+
1. Redistributions of source code must retain the above copyright notice, this list of
7+
conditions and the following disclaimer.
8+
9+
2. Redistributions in binary form must reproduce the above copyright notice, this list
10+
of conditions and the following disclaimer in the documentation and/or other materials
11+
provided with the distribution.
12+
13+
THIS SOFTWARE IS PROVIDED BY TOMAS VONDRA ''AS IS'' AND ANY EXPRESS OR IMPLIED
14+
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
15+
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL TOMAS VONDRA OR
16+
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
17+
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
18+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
19+
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
20+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
21+
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
22+
23+
The views and conclusions contained in the software and documentation are those of the
24+
authors and should not be interpreted as representing official policies, either expressed
25+
or implied, of Tomas Vondra.

‎contrib/shared_ispell/META.json

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
{
2+
"name":"shared_ispell",
3+
"abstract":"Provides a shared ispell dictionary - initialized once and stored in shared segment.",
4+
"description":"Allows you to allocate area within a shared segment and use it for ispell dictionaries.",
5+
"version":"1.0.0",
6+
"maintainer":"Tomas Vondra <tv@fuzzy.cz>",
7+
"license":"bsd",
8+
"prereqs": {
9+
"runtime": {
10+
"requires": {
11+
"PostgreSQL":"8.4.0"
12+
}
13+
}
14+
},
15+
"provides": {
16+
"query_histogram": {
17+
"file":"shared_ispell--1.0.0.sql",
18+
"version":"1.0.0"
19+
}
20+
},
21+
"resources": {
22+
"repository": {
23+
"url":"https://github.com:tvondra/shared_ispell.git",
24+
"web":"http://github.com/tvondra/shared_ispell",
25+
"type":"git"
26+
}
27+
},
28+
"tags" : ["ispell","shared","fulltext","dictionary"],
29+
"meta-spec": {
30+
"version":"1.0.0",
31+
"url":"http://pgxn.org/meta/spec.txt"
32+
},
33+
"release_status" :"testing"
34+
}

‎contrib/shared_ispell/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# contrib/shared_ispell/Makefile
2+
3+
MODULE_big = shared_ispell
4+
OBJS = src/shared_ispell.o
5+
6+
EXTENSION = shared_ispell
7+
DATA = sql/shared_ispell--1.1.0.sql
8+
9+
REGRESS = shared_ispell
10+
11+
ifdefUSE_PGXS
12+
PG_CONFIG = pg_config
13+
PGXS :=$(shell$(PG_CONFIG) --pgxs)
14+
include$(PGXS)
15+
else
16+
subdir = contrib/shared_ispell
17+
top_builddir = ../..
18+
include$(top_builddir)/src/Makefile.global
19+
include$(top_srcdir)/contrib/contrib-global.mk
20+
endif

‎contrib/shared_ispell/README.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
Shared ISpell Dictionary
2+
========================
3+
This PostgreSQL extension provides a shared ispell dictionary, i.e.
4+
a dictionary that's stored in shared segment. The traditional ispell
5+
implementation means that each session initializes and stores the
6+
dictionary on it's own, which means a lot of CPU/RAM is wasted.
7+
8+
This extension allocates an area in shared segment (you have to
9+
choose the size in advance) and then loads the dictionary into it
10+
when it's used for the first time.
11+
12+
If you need just snowball-type dictionaries, this extension is not
13+
really interesting for you. But if you really need an ispell
14+
dictionary, this may save you a lot of resources.
15+
16+
17+
Install
18+
-------
19+
Installing the extension is quite simple, especially if you're on 9.1.
20+
In that case all you need to do is this:
21+
22+
$ make install
23+
24+
and then (after connecting to the database)
25+
26+
db=# CREATE EXTENSION shared_ispell;
27+
28+
If you're on pre-9.1 version, you'll have to do the second part manually
29+
by running the SQL script (shared_ispell--x.y.sql) in the database. If
30+
needed, replace MODULE_PATHNAME by $libdir.
31+
32+
33+
Config
34+
------
35+
No the functions are created, but you still need to load the shared
36+
module. This needs to be done from postgresql.conf, as the module
37+
needs to allocate space in the shared memory segment. So add this to
38+
the config file (or update the current values)
39+
40+
# libraries to load
41+
shared_preload_libraries = 'shared_ispell'
42+
43+
# known GUC prefixes
44+
custom_variable_classes = 'shared_ispell'
45+
46+
# config of the shared memory
47+
shared_ispell.max_size = 32MB
48+
49+
Yes, there's a single GUC variable that defines the maximum size of
50+
the shared segment. This is a hard limit, the shared segment is not
51+
extensible and you need to set it so that all the dictionaries fit
52+
into it and not much memory is wasted.
53+
54+
To find out how much memory you actually need, use a large value
55+
(e.g. 200MB) and load all the dictionaries you want to use. Then use
56+
the shared_ispell_mem_used() function to find out how much memory
57+
was actually used (and set the max_size GUC variable accordingly).
58+
59+
Don't set it exactly to that value, leave there some free space,
60+
so that you can reload the dictionaries without changing the GUC
61+
max_size limit (which requires a restart of the DB). Ssomething
62+
like 512kB should be just fine.
63+
64+
The shared segment can contain several dictionaries at the same time,
65+
the amount of memory is the only limit. There's no limit on number
66+
of dictionaries / words etc. Just the max_size GUC variable.
67+
68+
69+
Using the dictionary
70+
--------------------
71+
Technically, the extension defines a 'shared_ispell' template that
72+
you may use to define custom dictionaries. E.g. you may do this
73+
74+
CREATE TEXT SEARCH DICTIONARY czech_shared (
75+
TEMPLATE = shared_ispell,
76+
DictFile = czech,
77+
AffFile = czech,
78+
StopWords = czech
79+
);
80+
81+
CREATE TEXT SEARCH CONFIGURATION public.czech_shared
82+
( COPY = pg_catalog.simple );
83+
84+
ALTER TEXT SEARCH CONFIGURATION czech_shared
85+
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
86+
word, hword, hword_part
87+
WITH czech_shared;
88+
89+
and then do the usual stuff, e.g.
90+
91+
db=# SELECT ts_lexize('czech_shared', 'automobile');
92+
93+
or whatever you want.
94+
95+
96+
Available functions
97+
-------------------
98+
The extension provides five management functions, that allow you to
99+
manage and get info about the preloaded dictionaries. The first two
100+
functions
101+
102+
shared_ispell_mem_used()
103+
shared_ispell_mem_available()
104+
105+
allow you to get info about the shared segment (used and free memory)
106+
e.g. to properly size the segment (max_size). Then there are functions
107+
return list of dictionaries / stop lists loaded in the shared segment
108+
109+
shared_ispell_dicts()
110+
shared_ispell_stoplists()
111+
112+
e.g. like this
113+
114+
db=# SELECT * FROM shared_ispell_dicts();
115+
116+
dict_name | affix_name | words | affixes | bytes
117+
-----------+------------+-------+---------+----------
118+
bulgarian | bulgarian | 79267 | 12 | 7622128
119+
czech | czech | 96351 | 2544 | 12715000
120+
(2 rows)
121+
122+
123+
db=# SELECT * FROM shared_ispell_stoplists();
124+
125+
stop_name | words | bytes
126+
-----------+-------+-------
127+
czech | 259 | 4552
128+
(1 row)
129+
130+
The last function allows you to reset the dictionary (e.g. so that you
131+
can reload the updated files from disk). The sessions that already use
132+
the dictionaries will be forced to reinitialize them (the first one
133+
will rebuild and copy them in the shared segment, the other ones will
134+
use this prepared data).
135+
136+
db=# SELECT shared_ispell_reset();
137+
138+
That's all for now ...

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp