|
1 |
| -<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.3 2006/09/14 21:15:07 tgl Exp $ --> |
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.4 2006/09/18 12:11:36 teodor Exp $ --> |
2 | 2 |
|
3 | 3 | <chapter id="GIN">
|
4 | 4 | <title>GIN Indexes</title>
|
|
14 | 14 | <para>
|
15 | 15 | <acronym>GIN</acronym> stands for Generalized Inverted Index. It is
|
16 | 16 | an index structure storing a set of (key, posting list) pairs, where
|
17 |
| - 'posting list' is a set of rows in which the key occurs.The |
| 17 | + 'posting list' is a set of rows in which the key occurs.Each |
18 | 18 | row may contain many keys.
|
19 | 19 | </para>
|
20 | 20 |
|
|
104 | 104 | <listitem>
|
105 | 105 | <para>
|
106 | 106 | Returns an array of keys of the query to be executed. n contains
|
107 |
| - strategy number of operation (see <xref linkend="xindex-strategies">). |
| 107 | + the strategy number of the operation |
| 108 | + (see <xref linkend="xindex-strategies">). |
108 | 109 | Depending on n, query may be different type.
|
109 | 110 | </para>
|
110 | 111 | </listitem>
|
|
114 | 115 | <term>bool consistent( bool check[], StrategyNumber n, Datum query)</term>
|
115 | 116 | <listitem>
|
116 | 117 | <para>
|
117 |
| - Returns TRUE if indexed value satisfies query qualifier with strategy n |
118 |
| - (or may satisfy in case of RECHECK mark in operator class). |
119 |
| - Each element of the check array is TRUE if indexed value has a |
| 118 | + Returns TRUE iftheindexed value satisfiesthequery qualifier with |
| 119 | +strategy n(or may satisfy in case of RECHECK mark in operator class). |
| 120 | + Each element of the check array is TRUE iftheindexed value has a |
120 | 121 | corresponding key in the query: if (check[i] == TRUE ) the i-th key of
|
121 | 122 | the query is present in the indexed value.
|
122 | 123 | </para>
|
|
135 | 136 | <term>Create vs insert</term>
|
136 | 137 | <listitem>
|
137 | 138 | <para>
|
138 |
| - In most cases, insertion into <acronym>GIN</acronym> index is slow because |
139 |
| -many GIN keys may beinserted for eachtable row. So, when loading data |
140 |
| -inbulk itmay be useful to dropindex and recreate it |
141 |
| -after the data is loaded in the table. |
| 139 | + In most cases, insertion into <acronym>GIN</acronym> index is slow |
| 140 | +due to the likelihood of many keys beinginserted for eachvalue. |
| 141 | +So, forbulkinsertions into a tableitis advisable to to dropthe GIN |
| 142 | +index and recreate it after finishing bulk insertion. |
142 | 143 | </para>
|
143 | 144 | </listitem>
|
144 | 145 | </varlistentry>
|
|
147 | 148 | <term>gin_fuzzy_search_limit</term>
|
148 | 149 | <listitem>
|
149 | 150 | <para>
|
150 |
| - The primary goal ofdevelopment <acronym>GIN</acronym> indices was |
| 151 | + The primary goal ofdeveloping <acronym>GIN</acronym> indices was |
151 | 152 | support for highly scalable, full-text search in
|
152 | 153 | <productname>PostgreSQL</productname> and there are often situations when
|
153 | 154 | a full-text search returns a very large set of results. Since reading
|
|
158 | 159 | <para>
|
159 | 160 | Such queries usually contain very frequent words, so the results are not
|
160 | 161 | very helpful. To facilitate execution of such queries
|
161 |
| - <acronym>GIN</acronym> has a configurablesoft upper limit of the size |
| 162 | + <acronym>GIN</acronym> has a configurable soft upper limit of the size |
162 | 163 | of the returned set, determined by the
|
163 | 164 | <varname>gin_fuzzy_search_limit</varname> GUC variable. It is set to 0 by
|
164 | 165 | default (no limit).
|
|
182 | 183 | <title>Limitations</title>
|
183 | 184 |
|
184 | 185 | <para>
|
185 |
| - <acronym>GIN</acronym> doesn't support fullscan ofindex due toit's |
186 |
| - extremely inefficiency: becauseof a lot of keys per value, |
| 186 | + <acronym>GIN</acronym> doesn't support full indexscansdue totheir |
| 187 | + extremely inefficiency: becausethere are often many keys per value, |
187 | 188 | each heap pointer will returned several times.
|
188 | 189 | </para>
|
189 | 190 |
|
190 | 191 | <para>
|
191 |
| - When extractQuery returns zeronumber ofkeys, <acronym>GIN</acronym> will |
192 |
| -emit aerror: for differentopclass andstrategysemantic meaning ofvoid |
193 |
| - query may be different (for example, any array contains void array, |
194 |
| - but theyaren'toverlapped with voidone), and <acronym>GIN</acronym> can't |
| 192 | + When extractQuery returns zero keys, <acronym>GIN</acronym> will emit a |
| 193 | + error: for differentopclasses andstrategies thesemantic meaning ofa void |
| 194 | + query may be different (for example, any array containsthevoid array, |
| 195 | + but theydon'toverlap the voidarray), and <acronym>GIN</acronym> can't |
195 | 196 | suggest reasonable answer.
|
196 | 197 | </para>
|
197 | 198 |
|
|