1
+ <!-- doc/src/sgml/shared-ispell.sgml -->
2
+
3
+ <sect1 id="shared-ispell" xreflabel="shared_ispell">
4
+ <title>shared_ispell</title>
5
+
6
+ <indexterm zone="shared-ispell">
7
+ <primary>shared_ispell</primary>
8
+ </indexterm>
9
+
10
+ <para>
11
+ The <filename>shared_ispell</filename> module provides a shared ispell
12
+ dictionary, i.e. a dictionary that's stored in shared segment. The traditional
13
+ ispell implementation means that each session initializes and stores the
14
+ dictionary on it's own, which means a lot of CPU/RAM is wasted.
15
+ </para>
16
+
17
+ <para>
18
+ This extension allocates an area in shared segment (you have to choose the
19
+ size in advance) and then loads the dictionary into it when it's used for the
20
+ first time.
21
+ </para>
22
+
23
+ <sect2>
24
+ <title>Functions</title>
25
+
26
+ <para>
27
+ The functions provided by the <filename>shared_ispell</filename> module
28
+ are shown in <xref linkend="shared-ispell-func-table">.
29
+ </para>
30
+
31
+ <table id="shared-ispell-func-table">
32
+ <title><filename>shared_ispell</filename> Functions</title>
33
+ <tgroup cols="3">
34
+ <thead>
35
+ <row>
36
+ <entry>Function</entry>
37
+ <entry>Returns</entry>
38
+ <entry>Description</entry>
39
+ </row>
40
+ </thead>
41
+
42
+ <tbody>
43
+ <row>
44
+ <entry><function>shared_ispell_reset()</function><indexterm><primary>shared_ispell_reset</primary></indexterm></entry>
45
+ <entry><type>void</type></entry>
46
+ <entry>
47
+ Resets the dictionaries (e.g. so that you can reload the updated files
48
+ from disk). The sessions that already use the dictionaries will be forced
49
+ to reinitialize them.
50
+ </entry>
51
+ </row>
52
+ <row>
53
+ <entry><function>shared_ispell_mem_used()</function><indexterm><primary>shared_ispell_mem_used</primary></indexterm></entry>
54
+ <entry><type>int</type></entry>
55
+ <entry>
56
+ Returns a value of used memory of the shared segment by loaded shared
57
+ dictionaries in bytes.
58
+ </entry>
59
+ </row>
60
+ <row>
61
+ <entry><function>shared_ispell_mem_available()</function><indexterm><primary>shared_ispell_mem_available</primary></indexterm></entry>
62
+ <entry><type>int</type></entry>
63
+ <entry>
64
+ Returns a value of available memory of the shared segment.
65
+ </entry>
66
+ </row>
67
+ <row>
68
+ <entry><function>shared_ispell_dicts()</function><indexterm><primary>shared_ispell_dicts</primary></indexterm></entry>
69
+ <entry><type>setof(dict_name varchar, affix_name varchar, words int, affixes int, bytes int)</type></entry>
70
+ <entry>
71
+ Returns a list of dictionaries loaded in the shared segment.
72
+ </entry>
73
+ </row>
74
+ <row>
75
+ <entry><function>shared_ispell_stoplists()</function><indexterm><primary>shared_ispell_stoplists</primary></indexterm></entry>
76
+ <entry><type>setof(stop_name varchar, words int, bytes int)</type></entry>
77
+ <entry>
78
+ Returns a list of stopwords loaded in the shared segment.
79
+ </entry>
80
+ </row>
81
+ </tbody>
82
+ </tgroup>
83
+ </table>
84
+ </sect2>
85
+
86
+ <sect2>
87
+ <title>GUC Parameters</title>
88
+
89
+ <variablelist>
90
+ <varlistentry id="guc-shared-ispell-max-size" xreflabel="shared_ispell.max_size">
91
+ <term>
92
+ <varname>shared_ispell.max_size</> (<type>int</type>)
93
+ <indexterm>
94
+ <primary><varname>shared_ispell.max_size</> configuration parameter</primary>
95
+ </indexterm>
96
+ </term>
97
+ <listitem>
98
+ <para>
99
+ Defines the maximum size of the shared segment. This is a hard limit, the
100
+ shared segment is not extensible and you need to set it so that all the
101
+ dictionaries fit into it and not much memory is wasted.
102
+ </para>
103
+ </listitem>
104
+ </varlistentry>
105
+ </variablelist>
106
+ </sect2>
107
+
108
+ <sect2>
109
+ <title>Using the dictionary</title>
110
+
111
+ <para>
112
+ The module needs to allocate space in the shared memory segment. So add this
113
+ to the config file (or update the current values):
114
+
115
+ <programlisting>
116
+ # libraries to load
117
+ shared_preload_libraries = 'shared_ispell'
118
+
119
+ # config of the shared memory
120
+ shared_ispell.max_size = 32MB
121
+ </programlisting>
122
+ </para>
123
+
124
+ <para>
125
+ To find out how much memory you actually need, use a large value (e.g. 200MB)
126
+ and load all the dictionaries you want to use. Then use the
127
+ <function>shared_ispell_mem_used()</function> function to find out how much
128
+ memory was actually used (and set the <varname>shared_ispell.max_size</varname>
129
+ GUC variable accordingly).
130
+ </para>
131
+
132
+ <para>
133
+ Don't set it exactly to that value, leave there some free space, so that you
134
+ can reload the dictionaries without changing the GUC max_size limit
135
+ (which requires a restart of the DB). Something like 512kB should be just fine.
136
+ </para>
137
+
138
+ <para>
139
+ The extension defines a <literal>shared_ispell</literal> template that you
140
+ may use to define custom dictionaries. E.g. you may do this:
141
+
142
+ <programlisting>
143
+ CREATE TEXT SEARCH DICTIONARY english_shared (
144
+ TEMPLATE = shared_ispell,
145
+ DictFile = en_us,
146
+ AffFile = en_us,
147
+ StopWords = english
148
+ );
149
+
150
+ CREATE TEXT SEARCH CONFIGURATION public.english_shared
151
+ ( COPY = pg_catalog.simple );
152
+
153
+ ALTER TEXT SEARCH CONFIGURATION english_shared
154
+ ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
155
+ word, hword, hword_part
156
+ WITH english_shared, english_stem;
157
+ </programlisting>
158
+ </para>
159
+
160
+ <para>
161
+ We can test created configuration:
162
+
163
+ <programlisting>
164
+ SELECT * FROM ts_debug('english_shared', 'abilities');
165
+ alias | description | token | dictionaries | dictionary | lexemes
166
+ -----------+-----------------+-----------+-------------------------------+----------------+-----------
167
+ asciiword | Word, all ASCII | abilities | {english_shared,english_stem} | english_shared | {ability}
168
+ (1 row)
169
+ </programlisting>
170
+ </para>
171
+
172
+ <para>
173
+ Or you can update your own text search configuration. For example, you have
174
+ the <literal>public.english</literal> dictionary. You can update it to use
175
+ the <literal>shared_ispell</literal> template:
176
+
177
+ <programlisting>
178
+ ALTER TEXT SEARCH CONFIGURATION public.english
179
+ ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
180
+ word, hword, hword_part
181
+ WITH english_shared, english_stem;
182
+ </programlisting>
183
+ </para>
184
+
185
+ </sect2>
186
+
187
+ <sect2>
188
+ <title>Author</title>
189
+
190
+ <para>
191
+ Tomas Vondra <email>tomas.vondra@2ndquadrant.com</email>, Prague, Czech Republic
192
+ </para>
193
+ </sect2>
194
+
195
+ </sect1>