Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitcbfa409

Browse files
committed
trgm - Trigram matching for PostgreSQL
--------------------------------------The pg_trgm contrib module provides functions and index classesfor determining the similarity of text based on trigrammatching.
1 parent553bc41 commitcbfa409

File tree

9 files changed

+4529
-0
lines changed

9 files changed

+4529
-0
lines changed

‎contrib/pg_trgm/Makefile

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
subdir = contrib/pg_trgm
2+
top_builddir = ../..
3+
include$(top_builddir)/src/Makefile.global
4+
5+
6+
overrideCPPFLAGS := -I.$(CPPFLAGS)
7+
8+
MODULE_big = pg_trgm
9+
OBJS = trgm_op.o trgm_gist.o
10+
11+
DATA_built = pg_trgm.sql
12+
DOCS = README.pg_trgm
13+
REGRESS = pg_trgm
14+
15+
include$(top_srcdir)/contrib/contrib-global.mk
16+
# DO NOT DELETE
17+

‎contrib/pg_trgm/README.pg_trgm

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
trgm - Trigram matching for PostgreSQL
2+
--------------------------------------
3+
4+
Introduction
5+
6+
This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
7+
8+
The pg_trgm contrib module provides functions and index classes
9+
for determining the similarity of text based on trigram
10+
matching.
11+
12+
Definitions
13+
14+
Trigram (or Trigraph)
15+
16+
A trigram is a set of three consecutive characters taken
17+
from a string. A string is considered to have two spaces
18+
prefixed and one space suffixed when determining the set
19+
of trigrams that comprise the string.
20+
21+
eg. The set of trigrams in the word "cat" is " c", " ca",
22+
"at " and "cat".
23+
24+
Public Functions
25+
26+
real similarity(text, text)
27+
28+
Returns a number that indicates how closely matches the two
29+
arguments are. A zero result indicates that the two words
30+
are completely dissimilar, and a result of one indicates that
31+
the two words are identical.
32+
33+
real show_limit()
34+
35+
Returns the current similarity threshold used by the '%'
36+
operator. This in effect sets the minimum similarity between
37+
two words in order that they be considered similar enough to
38+
be misspellings of each other, for example.
39+
40+
real set_limit(real)
41+
42+
Sets the current similarity threshold that is used by the '%'
43+
operator, and is returned by the show_limit() function.
44+
45+
text[] show_trgm(text)
46+
47+
Returns an array of all the trigrams of the supplied text
48+
parameter.
49+
50+
Public Operators
51+
52+
text % text (returns boolean)
53+
54+
The '%' operator returns TRUE if its two arguments have a similarity
55+
that is greater than the similarity threshold set by set_limit(). It
56+
will return FALSE if the similarity is less than the current
57+
threshold.
58+
59+
Public Index Operator Classes
60+
61+
gist_trgm_ops
62+
63+
The pg_trgm module comes with an index operator class that allows a
64+
developer to create an index over a text column for the purpose
65+
of very fast similarity searches.
66+
67+
To use this index, the '%' operator must be used and an appropriate
68+
similarity threshold for the application must be set.
69+
70+
eg.
71+
72+
CREATE TABLE test_trgm (t text);
73+
CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
74+
75+
At this point, you will have an index on the t text column that you
76+
can use for similarity searching.
77+
78+
eg.
79+
80+
SELECT
81+
t,
82+
similarity(t, 'word') AS sml
83+
FROM
84+
test_trgm
85+
WHERE
86+
t % 'word'
87+
ORDER BY
88+
sml DESC, t;
89+
90+
This will return all values in the text column that are sufficiently
91+
similar to 'word', sorted from best match to worst. The index will
92+
be used to make this a fast operation over very large data sets.
93+
94+
Tsearch2 Integration
95+
96+
Trigram matching is a very useful tool when used in conjunction
97+
with a text index created by the Tsearch2 contrib module. (See
98+
contrib/tsearch2)
99+
100+
The first step is to generate an auxiliary table containing all
101+
the unique words in the Tsearch2 index:
102+
103+
CREATE TABLE words AS
104+
SELECT word FROM stat('SELECT vector FROM documents');
105+
106+
Where 'documents' is the table that contains the Tsearch2 index
107+
column 'vector', of type 'tsvector'.
108+
109+
Next, create a trigram index on the word column:
110+
111+
CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
112+
113+
Now, a SELECT query similar to the example above can be used to
114+
suggest spellings for misspelled words in user search terms. A
115+
useful extra clause is to ensure that the similar words are also
116+
of similar length to the misspelled word.
117+
118+
Note: Since the 'words' table has been generated as a separate,
119+
static table, it will need to be periodically regenerated so that
120+
it remains up to date with the word list in the Tsearch2 index.
121+
122+
Authors
123+
124+
Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
125+
Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia
126+
127+
Contributors
128+
129+
Christopher Kings-Lynne wrote this README file
130+
131+
References
132+
133+
Tsearch2 Development Site
134+
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
135+
136+
GiST Development Site
137+
http://www.sai.msu.su/~megera/postgres/gist/
138+

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp