NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commitaea7c17

committed

Rework word_similarity documentation, make it close to actual algorithm.

word_similarity before claimed as returning similarity of closest word instring, but, actually it returns similarity of substring. Also fix mistypedcomments.Author: Alexander KorotkovReview by: David Steele, Liudmila MantrovaDiscussionis:https://www.postgresql.org/message-id/flat/CY4PR17MB13207ED8310F847CF117EED0D85A0@CY4PR17MB1320.namprd17.prod.outlook.com https://www.postgresql.org/message-id/flat/f43b242d-000c-f4c8-cb8b-d37e9752cd93%40postgrespro.ru

1 parentd652e35 commitaea7c17Copy full SHA for aea7c17

File tree

2 files changed

+44

-16

lines changed

contrib/pg_trgm
- trgm_op.c
doc/src/sgml
- pgtrgm.sgml

2 files changed

+44

-16

lines changed

`‎contrib/pg_trgm/trgm_op.c`

Lines changed: 2 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -456,7 +456,7 @@ iterate_word_similarity(int *trg2indexes,`
`456`	`456`	`lastpos[trgindex]=i;`
`457`	`457`	`}`
`458`	`458`
`459`		`-/* Adjustlower bound if this trigram is present in required substring */`
	`459`	`+/* Adjustupper bound if this trigram is present in required substring */`
`460`	`460`	`if (found[trgindex])`
`461`	`461`	`{`
`462`	`462`	`intprev_lower,`
`@@ -473,7 +473,7 @@ iterate_word_similarity(int *trg2indexes,`
`473`	`473`
`474`	`474`	`smlr_cur=CALCSML(count,ulen1,ulen2);`
`475`	`475`
`476`		`-/* Also try to adjustupper bound for greater similarity */`
	`476`	`+/* Also try to adjustlower bound for greater similarity */`
`477`	`477`	`tmp_count=count;`
`478`	`478`	`tmp_ulen2=ulen2;`
`479`	`479`	`prev_lower=lower;`

`‎doc/src/sgml/pgtrgm.sgml`

Lines changed: 42 additions & 14 deletions

Original file line number	Diff line number	Diff line change
`@@ -99,12 +99,10 @@`
`99`	`99`	`</entry>`
`100`	`100`	`<entry><type>real</type></entry>`
`101`	`101`	`<entry>`
`102`		`- Returns a number that indicates how similar the first string`
`103`		`- to the most similar word of the second string. The function searches in`
`104`		`- the second string a most similar word not a most similar substring. The`
`105`		`- range of the result is zero (indicating that the two strings are`
`106`		`- completely dissimilar) to one (indicating that the first string is`
`107`		`- identical to one of the words of the second string).`
	`102`	`+ Returns a number that indicates the greatest similarity between`
	`103`	`+ the set of trigrams in the first string and any continuous extent`
	`104`	`+ of an ordered set of trigrams in the second string. For details, see`
	`105`	`+ the explanation below.`
`108`	`106`	`</entry>`
`109`	`107`	`</row>`
`110`	`108`	`<row>`
`@@ -131,6 +129,34 @@`
`131`	`129`	`</tgroup>`
`132`	`130`	`</table>`
`133`	`131`
	`132`	`+ <para>`
	`133`	`+ Consider the following example:`
	`134`	`+`
	`135`	`+<programlisting>`
	`136`	`+# SELECT word_similarity('word', 'two words');`
	`137`	`+ word_similarity`
	`138`	`+-----------------`
	`139`	`+ 0.8`
	`140`	`+(1 row)`
	`141`	`+</programlisting>`
	`142`	`+`
	`143`	`+ In the first string, the set of trigrams is`
	`144`	`+ <literal>{" w"," wo","ord","wor","rd "}</literal>.`
	`145`	`+ In the second string, the ordered set of trigrams is`
	`146`	`+ <literal>{" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}</literal>.`
	`147`	`+ The most similar extent of an ordered set of trigrams in the second string`
	`148`	`+ is <literal>{" w"," wo","wor","ord"}</literal>, and the similarity is`
	`149`	`+ <literal>0.8</literal>.`
	`150`	`+ </para>`
	`151`	`+`
	`152`	`+ <para>`
	`153`	`+ This function returns a value that can be approximately understood as the`
	`154`	`+ greatest similarity between the first string and any substring of the second`
	`155`	`+ string. However, this function does not add padding to the boundaries of`
	`156`	`+ the extent. Thus, a whole word match gets a higher score than a match with`
	`157`	`+ a part of the word.`
	`158`	`+ </para>`
	`159`	`+`
`134`	`160`	`<table id="pgtrgm-op-table">`
`135`	`161`	`<title><filename>pg_trgm</filename> Operators</title>`
`136`	`162`	`<tgroup cols="3">`
`@@ -156,10 +182,11 @@`
`156`	`182`	`<entry><type>text</type> <literal><%</literal> <type>text</type></entry>`
`157`	`183`	`<entry><type>boolean</type></entry>`
`158`	`184`	`<entry>`
`159`		`- Returns <literal>true</literal> if its first argument has the similar word in`
`160`		`- the second argument and they have a similarity that is greater than the`
`161`		`- current word similarity threshold set by`
`162`		`- <varname>pg_trgm.word_similarity_threshold</varname> parameter.`
	`185`	`+ Returns <literal>true</literal> if the similarity between the trigram`
	`186`	`+ set in the first argument and a continuous extent of an ordered trigram`
	`187`	`+ set in the second argument is greater than the current word similarity`
	`188`	`+ threshold set by <varname>pg_trgm.word_similarity_threshold</varname>`
	`189`	`+ parameter.`
`163`	`190`	`</entry>`
`164`	`191`	`</row>`
`165`	`192`	`<row>`
`@@ -302,10 +329,11 @@ SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml`
`302`	`329`	`WHERE '<replaceable>word</replaceable>' <% t`
`303`	`330`	`ORDER BY sml DESC, t;`
`304`	`331`	`</programlisting>`
`305`		`- This will return all values in the text column that have a word`
`306`		`- which sufficiently similar to <replaceable>word</replaceable>, sorted from best`
`307`		`- match to worst. The index will be used to make this a fast operation`
`308`		`- even over very large data sets.`
	`332`	`+ This will return all values in the text column for which there is a`
	`333`	`+ continuous extent in the corresponding ordered trigram set that is`
	`334`	`+ sufficiently similar to the trigram set of <replaceable>word</replaceable>,`
	`335`	`+ sorted from best match to worst. The index will be used to make this`
	`336`	`+ a fast operation even over very large data sets.`
`309`	`337`	`</para>`
`310`	`338`
`311`	`339`	`<para>`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitaea7c17

File tree

2 files changed

2 files changed

`‎contrib/pg_trgm/trgm_op.c`

`‎doc/src/sgml/pgtrgm.sgml`

0 commit comments