Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit0ca9907

Browse files
committed
GIN documentation and slightly improving GiST docs.
Thanks to Christopher Kings-Lynne <chris.kingslynne@gmail.com> forinitial version and Jeff Davis <pgsql@j-davis.com> for inspection
1 parent4eef745 commit0ca9907

File tree

8 files changed

+395
-20
lines changed

8 files changed

+395
-20
lines changed

‎doc/src/sgml/config.sgml

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.85 2006/09/08 15:55:52 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.86 2006/09/14 11:16:27 teodor Exp $ -->
22

33
<chapter Id="runtime-config">
44
<title>Server Configuration</title>
@@ -2172,7 +2172,20 @@ SELECT * FROM parent WHERE key = 2400;
21722172
</para>
21732173
</listitem>
21742174
</varlistentry>
2175-
2175+
2176+
<varlistentry id="guc-gin-fuzzy-search-limit" xreflabel="gin_fuzzy_search_limit">
2177+
<term><varname>gin_fuzzy_search_limit</varname> (<type>integer</type>)</term>
2178+
<indexterm>
2179+
<primary><varname>gin_fuzzy_search_limit</> configuration parameter</primary>
2180+
</indexterm>
2181+
<listitem>
2182+
<para>
2183+
Soft upper limit of the size of the returned set by GIN index. For more
2184+
information see <xref linkend="gin-tips">.
2185+
</para>
2186+
</listitem>
2187+
</varlistentry>
2188+
21762189
</variablelist>
21772190
</sect2>
21782191
</sect1>

‎doc/src/sgml/filelist.sgml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.46 2006/09/05 03:09:56 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.47 2006/09/14 11:16:27 teodor Exp $ -->
22

33
<!entity history SYSTEM "history.sgml">
44
<!entity info SYSTEM "info.sgml">
@@ -78,6 +78,7 @@
7878
<!entity catalogs SYSTEM "catalogs.sgml">
7979
<!entity geqo SYSTEM "geqo.sgml">
8080
<!entity gist SYSTEM "gist.sgml">
81+
<!entity gin SYSTEM "gin.sgml">
8182
<!entity planstats SYSTEM "planstats.sgml">
8283
<!entity indexam SYSTEM "indexam.sgml">
8384
<!entity nls SYSTEM "nls.sgml">

‎doc/src/sgml/geqo.sgml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.36 2006/03/10 19:10:48 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.37 2006/09/14 11:16:27 teodor Exp $ -->
22

33
<chapter id="geqo">
44
<chapterinfo>
@@ -49,8 +49,8 @@
4949
methods</firstterm> (e.g., nested loop, hash join, merge join in
5050
<productname>PostgreSQL</productname>) to process individual joins
5151
and a diversity of <firstterm>indexes</firstterm> (e.g.,
52-
B-tree, hash, GiST in <productname>PostgreSQL</productname>) asaccess
53-
paths for relations.
52+
B-tree, hash, GiSTand GINin <productname>PostgreSQL</productname>) as
53+
access paths for relations.
5454
</para>
5555

5656
<para>

‎doc/src/sgml/gin.sgml

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.1 2006/09/14 11:16:27 teodor Exp $ -->
2+
3+
<chapter id="GIN">
4+
<title>GIN Indexes</title>
5+
6+
<indexterm>
7+
<primary>index</primary>
8+
<secondary>GIN</secondary>
9+
</indexterm>
10+
11+
<sect1 id="gin-intro">
12+
<title>Introduction</title>
13+
14+
<para>
15+
<acronym>GIN</acronym> stands for Generalized Inverted Index. It is
16+
an index structure storing a set of (key, posting list) pairs, where
17+
'posting list' is a set of rows in which the key occurs. The
18+
row may contain many keys.
19+
</para>
20+
21+
<para>
22+
It is generalized in the sense that a <acronym>GIN</acronym> index
23+
does not need to be aware of the operation that it accelerates.
24+
Instead, it uses custom strategies defined for particular data types.
25+
</para>
26+
27+
<para>
28+
One advantage of <acronym>GIN</acronym> is that it allows the development
29+
of custom data types with the appropriate access methods, by
30+
an expert in the domain of the data type, rather than a database expert.
31+
This is much the same advantage as using <acronym>GiST</acronym>.
32+
</para>
33+
34+
<para>
35+
The <acronym>GIN</acronym>
36+
implementation in <productname>PostgreSQL</productname> is primarily
37+
maintained by Teodor Sigaev and Oleg Bartunov, and there is more
38+
information on their
39+
<ulink url="http://www.sai.msu.su/~megera/oddmuse/index.cgi/Gin">website</ulink>.
40+
</para>
41+
42+
</sect1>
43+
44+
<sect1 id="gin-extensibility">
45+
<title>Extensibility</title>
46+
47+
<para>
48+
The <acronym>GIN</acronym> interface has a high level of abstraction,
49+
requiring the access method implementer to only implement the semantics of
50+
the data type being accessed. The <acronym>GIN</acronym> layer itself
51+
takes care of concurrency, logging and searching the tree structure.
52+
</para>
53+
54+
<para>
55+
All it takes to get a <acronym>GIN</acronym> access method working
56+
is to implement four user-defined methods, which define the behavior of
57+
keys in the tree. In short, <acronym>GIN</acronym> combines extensibility
58+
along with generality, code reuse, and a clean interface.
59+
</para>
60+
61+
</sect1>
62+
63+
<sect1 id="gin-implementation">
64+
<title>Implementation</title>
65+
66+
<para>
67+
Internally, <acronym>GIN</acronym> consists of a B-tree index constructed
68+
over keys, where each key is an element of the indexed value
69+
(element of array, for example) and where each tuple in a leaf page is
70+
either a pointer to a B-tree over heap pointers (PT, posting tree), or a
71+
list of heap pointers (PL, posting list) if the tuple is small enough.
72+
</para>
73+
74+
<para>
75+
There are four methods that an index operator class for
76+
<acronym>GIN</acronym> must provide (prototypes are in pseudocode):
77+
</para>
78+
79+
<variablelist>
80+
<varlistentry>
81+
<term>int compare( Datum a, Datum b )</term>
82+
<listitem>
83+
<para>
84+
Compares keys (not indexed values!) and returns an integer less than
85+
zero, zero, or greater than zero, indicating whether the first key is
86+
less than, equal to, or greater than the second.
87+
</para>
88+
</listitem>
89+
</varlistentry>
90+
91+
<varlistentry>
92+
<term>Datum* extractValue(Datum inputValue, uint32 *nkeys)</term>
93+
<listitem>
94+
<para>
95+
Returns an array of keys of value to be indexed, nkeys should
96+
contain the number of returned keys.
97+
</para>
98+
</listitem>
99+
</varlistentry>
100+
101+
<varlistentry>
102+
<term>Datum* extractQuery(Datum query, uint32 nkeys,
103+
StrategyNumber n)</term>
104+
<listitem>
105+
<para>
106+
Returns an array of keys of the query to be executed. n contains
107+
strategy number of operation (see <xref linkend="xindex-strategies">).
108+
Depending on n, query may be different type.
109+
</para>
110+
</listitem>
111+
</varlistentry>
112+
113+
<varlistentry>
114+
<term>bool consistent( bool check[], StrategyNumber n, Datum query)</term>
115+
<listitem>
116+
<para>
117+
Returns TRUE if indexed value satisfies query qualifier with strategy n
118+
(or may satisfy in case of RECHECK mark in operator class).
119+
Each element of the check array is TRUE if indexed value has a
120+
corresponding key in the query: if (check[i] == TRUE ) the i-th key of
121+
the query is present in the indexed value.
122+
</para>
123+
</listitem>
124+
</varlistentry>
125+
126+
</variablelist>
127+
128+
</sect1>
129+
130+
<sect1 id="gin-tips">
131+
<title>GIN tips and trics</title>
132+
133+
<variablelist>
134+
<varlistentry>
135+
<term>Create vs insert</term>
136+
<listitem>
137+
<para>
138+
In most cases, insertion into <acronym>GIN</acronym> index is slow because
139+
many GIN keys may be inserted for each table row. So, when loading data
140+
in bulk it may be useful to drop index and recreate it
141+
after the data is loaded in the table.
142+
</para>
143+
</listitem>
144+
</varlistentry>
145+
146+
<varlistentry>
147+
<term>gin_fuzzy_search_limit</term>
148+
<listitem>
149+
<para>
150+
The primary goal of development <acronym>GIN</acronym> indices was
151+
support for highly scalable, full-text search in
152+
<productname>PostgreSQL</productname> and there are often situations when
153+
a full-text search returns a very large set of results. Since reading
154+
tuples from the disk and sorting them could take a lot of time, this is
155+
unacceptable for production. (Note that the index search itself is very
156+
fast.)
157+
</para>
158+
<para>
159+
Such queries usually contain very frequent words, so the results are not
160+
very helpful. To facilitate execution of such queries
161+
<acronym>GIN</acronym> has a configurable soft upper limit of the size
162+
of the returned set, determined by the
163+
<varname>gin_fuzzy_search_limit</varname> GUC variable. It is set to 0 by
164+
default (no limit).
165+
</para>
166+
<para>
167+
If a non-zero search limit is set, then the returned set is a subset of
168+
the whole result set, chosen at random.
169+
</para>
170+
<para>
171+
"Soft" means that the actual number of returned results could slightly
172+
differ from the specified limit, depending on the query and the quality
173+
of the system's random number generator.
174+
</para>
175+
</listitem>
176+
</varlistentry>
177+
<variablelist>
178+
179+
</sect1>
180+
181+
<sect1 id="gin-limit">
182+
<title>Limitations</title>
183+
184+
<para>
185+
<acronym>GIN</acronym> doesn't support full scan of index due to it's
186+
extremely inefficiency: because of a lot of keys per value,
187+
each heap pointer will returned several times.
188+
</para>
189+
190+
<para>
191+
When extractQuery returns zero number of keys, <acronym>GIN</acronym> will
192+
emit a error: for different opclass and strategy semantic meaning of void
193+
query may be different (for example, any array contains void array,
194+
but they aren't overlapped with void one), and <acronym>GIN</acronym> can't
195+
suggest reasonable answer.
196+
</para>
197+
198+
<para>
199+
<acronym>GIN</acronym> searches keys only by equality matching. This may
200+
be improved in future.
201+
</para>
202+
</sect1>
203+
<sect1 id="gin-examples">
204+
<title>Examples</title>
205+
206+
<para>
207+
The <productname>PostgreSQL</productname> source distribution includes
208+
<acronym>GIN</acronym> classes for one-dimensional arrays of all internal
209+
types. The following
210+
<filename>contrib</> modules also contain <acronym>GIN</acronym>
211+
operator classes:
212+
</para>
213+
214+
<variablelist>
215+
<varlistentry>
216+
<term>intarray</term>
217+
<listitem>
218+
<para>Enhanced support for int4[]</para>
219+
</listitem>
220+
</varlistentry>
221+
222+
<varlistentry>
223+
<term>tsearch2</term>
224+
<listitem>
225+
<para>Support for inverted text indexing. This is much faster for very
226+
large, mostly-static sets of documents.
227+
</para>
228+
</listitem>
229+
</varlistentry>
230+
231+
</chapter>

‎doc/src/sgml/indices.sgml

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.61 2006/09/13 23:42:26 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.62 2006/09/14 11:16:27 teodor Exp $ -->
22

33
<chapter id="indexes">
44
<title id="indexes-title">Indexes</title>
@@ -116,7 +116,7 @@ CREATE INDEX test1_id_index ON test1 (id);
116116

117117
<para>
118118
<productname>PostgreSQL</productname> provides several index types:
119-
B-tree, Hash, and GiST. Each index type uses a different
119+
B-tree, Hash,GINand GiST. Each index type uses a different
120120
algorithm that is best suited to different types of queries.
121121
By default, the <command>CREATE INDEX</command> command will create a
122122
B-tree index, which fits the most common situations.
@@ -238,6 +238,37 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
238238
classes are available in the <literal>contrib</> collection or as separate
239239
projects. For more information see <xref linkend="GiST">.
240240
</para>
241+
<para>
242+
<indexterm>
243+
<primary>index</primary>
244+
<secondary>GIN</secondary>
245+
</indexterm>
246+
<indexterm>
247+
<primary>GIN</primary>
248+
<see>index</see>
249+
</indexterm>
250+
GIN is a inverted index and it's usable for values which have more
251+
than one key, arrays for example. Like to GiST, GIN may support
252+
many different user-defined indexing strategies and the particular
253+
operators with which a GIN index can be used vary depending on the
254+
indexing strategy.
255+
As an example, the standard distribution of
256+
<productname>PostgreSQL</productname> includes GIN operator classes
257+
for one-dimentional arrays, which support indexed
258+
queries using these operators:
259+
260+
<simplelist>
261+
<member><literal>&lt;@</literal></member>
262+
<member><literal>@&gt;</literal></member>
263+
<member><literal>=</literal></member>
264+
<member><literal>&amp;&amp;</literal></member>
265+
</simplelist>
266+
267+
(See <xref linkend="functions-array"> for the meaning of
268+
these operators.)
269+
Another GIN operator classes are available in the <literal>contrib</>
270+
tsearch2 and intarray modules. For more information see <xref linkend="GIN">.
271+
</para>
241272
</sect1>
242273

243274

‎doc/src/sgml/mvcc.sgml

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.58 2006/09/03 01:59:09 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.59 2006/09/14 11:16:27 teodor Exp $ -->
22

33
<chapter id="mvcc">
44
<title>Concurrency Control</title>
@@ -987,6 +987,20 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
987987
</para>
988988
</listitem>
989989
</varlistentry>
990+
991+
<varlistentry>
992+
<term>
993+
<acronym>GIN</acronym> indexes
994+
</term>
995+
<listitem>
996+
<para>
997+
Short-term share/exclusive page-level locks are used for
998+
read/write access. Locks are released immediately after each
999+
index row is fetched or inserted. However, note that GIN index
1000+
usually requires several inserts per one table row.
1001+
</para>
1002+
</listitem>
1003+
</varlistentry>
9901004
</variablelist>
9911005
</para>
9921006

@@ -995,7 +1009,7 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
9951009
applications; since they also have more features than hash
9961010
indexes, they are the recommended index type for concurrent
9971011
applications that need to index scalar data. When dealing with
998-
non-scalar data, B-trees are not useful, and GiST indexes should
1012+
non-scalar data, B-trees are not useful, and GiSTor GINindexes should
9991013
be used instead.
10001014
</para>
10011015
</sect1>

‎doc/src/sgml/ref/create_opclass.sgml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/create_opclass.sgml,v 1.15 2006/09/10 17:36:52 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/create_opclass.sgml,v 1.16 2006/09/14 11:16:27 teodor Exp $
33
PostgreSQL documentation
44
-->
55

@@ -192,7 +192,7 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL
192192
<para>
193193
The data type actually stored in the index. Normally this is
194194
the same as the column data type, but some index methods
195-
(onlyGiSTat this writing) allow it to be different. The
195+
(GIN andGiSTfor now) allow it to be different. The
196196
<literal>STORAGE</> clause must be omitted unless the index
197197
method allows a different type to be used.
198198
</para>

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp