Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitaea7c17

Browse files
committed
Rework word_similarity documentation, make it close to actual algorithm.
word_similarity before claimed as returning similarity of closest word instring, but, actually it returns similarity of substring. Also fix mistypedcomments.Author: Alexander KorotkovReview by: David Steele, Liudmila MantrovaDiscussionis:https://www.postgresql.org/message-id/flat/CY4PR17MB13207ED8310F847CF117EED0D85A0@CY4PR17MB1320.namprd17.prod.outlook.comhttps://www.postgresql.org/message-id/flat/f43b242d-000c-f4c8-cb8b-d37e9752cd93%40postgrespro.ru
1 parentd652e35 commitaea7c17

File tree

2 files changed

+44
-16
lines changed

2 files changed

+44
-16
lines changed

‎contrib/pg_trgm/trgm_op.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -456,7 +456,7 @@ iterate_word_similarity(int *trg2indexes,
456456
lastpos[trgindex]=i;
457457
}
458458

459-
/* Adjustlower bound if this trigram is present in required substring */
459+
/* Adjustupper bound if this trigram is present in required substring */
460460
if (found[trgindex])
461461
{
462462
intprev_lower,
@@ -473,7 +473,7 @@ iterate_word_similarity(int *trg2indexes,
473473

474474
smlr_cur=CALCSML(count,ulen1,ulen2);
475475

476-
/* Also try to adjustupper bound for greater similarity */
476+
/* Also try to adjustlower bound for greater similarity */
477477
tmp_count=count;
478478
tmp_ulen2=ulen2;
479479
prev_lower=lower;

‎doc/src/sgml/pgtrgm.sgml

Lines changed: 42 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -99,12 +99,10 @@
9999
</entry>
100100
<entry><type>real</type></entry>
101101
<entry>
102-
Returns a number that indicates how similar the first string
103-
to the most similar word of the second string. The function searches in
104-
the second string a most similar word not a most similar substring. The
105-
range of the result is zero (indicating that the two strings are
106-
completely dissimilar) to one (indicating that the first string is
107-
identical to one of the words of the second string).
102+
Returns a number that indicates the greatest similarity between
103+
the set of trigrams in the first string and any continuous extent
104+
of an ordered set of trigrams in the second string. For details, see
105+
the explanation below.
108106
</entry>
109107
</row>
110108
<row>
@@ -131,6 +129,34 @@
131129
</tgroup>
132130
</table>
133131

132+
<para>
133+
Consider the following example:
134+
135+
<programlisting>
136+
# SELECT word_similarity('word', 'two words');
137+
word_similarity
138+
-----------------
139+
0.8
140+
(1 row)
141+
</programlisting>
142+
143+
In the first string, the set of trigrams is
144+
<literal>{" w"," wo","ord","wor","rd "}</literal>.
145+
In the second string, the ordered set of trigrams is
146+
<literal>{" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}</literal>.
147+
The most similar extent of an ordered set of trigrams in the second string
148+
is <literal>{" w"," wo","wor","ord"}</literal>, and the similarity is
149+
<literal>0.8</literal>.
150+
</para>
151+
152+
<para>
153+
This function returns a value that can be approximately understood as the
154+
greatest similarity between the first string and any substring of the second
155+
string. However, this function does not add padding to the boundaries of
156+
the extent. Thus, a whole word match gets a higher score than a match with
157+
a part of the word.
158+
</para>
159+
134160
<table id="pgtrgm-op-table">
135161
<title><filename>pg_trgm</filename> Operators</title>
136162
<tgroup cols="3">
@@ -156,10 +182,11 @@
156182
<entry><type>text</type> <literal>&lt;%</literal> <type>text</type></entry>
157183
<entry><type>boolean</type></entry>
158184
<entry>
159-
Returns <literal>true</literal> if its first argument has the similar word in
160-
the second argument and they have a similarity that is greater than the
161-
current word similarity threshold set by
162-
<varname>pg_trgm.word_similarity_threshold</varname> parameter.
185+
Returns <literal>true</literal> if the similarity between the trigram
186+
set in the first argument and a continuous extent of an ordered trigram
187+
set in the second argument is greater than the current word similarity
188+
threshold set by <varname>pg_trgm.word_similarity_threshold</varname>
189+
parameter.
163190
</entry>
164191
</row>
165192
<row>
@@ -302,10 +329,11 @@ SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
302329
WHERE '<replaceable>word</replaceable>' &lt;% t
303330
ORDER BY sml DESC, t;
304331
</programlisting>
305-
This will return all values in the text column that have a word
306-
which sufficiently similar to <replaceable>word</replaceable>, sorted from best
307-
match to worst. The index will be used to make this a fast operation
308-
even over very large data sets.
332+
This will return all values in the text column for which there is a
333+
continuous extent in the corresponding ordered trigram set that is
334+
sufficiently similar to the trigram set of <replaceable>word</replaceable>,
335+
sorted from best match to worst. The index will be used to make this
336+
a fast operation even over very large data sets.
309337
</para>
310338

311339
<para>

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp