NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commitcbfa409

committed

trgm - Trigram matching for PostgreSQL

--------------------------------------The pg_trgm contrib module provides functions and index classesfor determining the similarity of text based on trigrammatching.

1 parent553bc41 commitcbfa409Copy full SHA for cbfa409

File tree

9 files changed

+4529

-0

lines changed

contrib/pg_trgm
- Makefile
- README.pg_trgm
- data
  - trgm.data
- expected
  - pg_trgm.out
- pg_trgm.sql.in
- sql
  - pg_trgm.sql
- trgm.h
- trgm_gist.c
- trgm_op.c

9 files changed

+4529

-0

lines changed

`‎contrib/pg_trgm/Makefile`

Lines changed: 17 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,17 @@`
	`1`	`+subdir = contrib/pg_trgm`
	`2`	`+top_builddir = ../..`
	`3`	`+include$(top_builddir)/src/Makefile.global`
	`4`	`+`
	`5`	`+`
	`6`	`+overrideCPPFLAGS := -I.$(CPPFLAGS)`
	`7`	`+`
	`8`	`+MODULE_big = pg_trgm`
	`9`	`+OBJS = trgm_op.o trgm_gist.o`
	`10`	`+`
	`11`	`+DATA_built = pg_trgm.sql`
	`12`	`+DOCS = README.pg_trgm`
	`13`	`+REGRESS = pg_trgm`
	`14`	`+`
	`15`	`+include$(top_srcdir)/contrib/contrib-global.mk`
	`16`	`+# DO NOT DELETE`
	`17`	`+`

`‎contrib/pg_trgm/README.pg_trgm`

Lines changed: 138 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,138 @@`
	`1`	`+trgm - Trigram matching for PostgreSQL`
	`2`	`+--------------------------------------`
	`3`	`+`
	`4`	`+Introduction`
	`5`	`+`
	`6`	`+This module is sponsored by Delta-Soft Ltd., Moscow, Russia.`
	`7`	`+`
	`8`	`+The pg_trgm contrib module provides functions and index classes`
	`9`	`+for determining the similarity of text based on trigram`
	`10`	`+matching.`
	`11`	`+`
	`12`	`+Definitions`
	`13`	`+`
	`14`	`+Trigram (or Trigraph)`
	`15`	`+`
	`16`	`+A trigram is a set of three consecutive characters taken`
	`17`	`+from a string. A string is considered to have two spaces`
	`18`	`+prefixed and one space suffixed when determining the set`
	`19`	`+of trigrams that comprise the string.`
	`20`	`+`
	`21`	`+eg. The set of trigrams in the word "cat" is " c", " ca",`
	`22`	`+"at " and "cat".`
	`23`	`+`
	`24`	`+Public Functions`
	`25`	`+`
	`26`	`+real similarity(text, text)`
	`27`	`+`
	`28`	`+Returns a number that indicates how closely matches the two`
	`29`	`+arguments are. A zero result indicates that the two words`
	`30`	`+are completely dissimilar, and a result of one indicates that`
	`31`	`+the two words are identical.`
	`32`	`+`
	`33`	`+real show_limit()`
	`34`	`+`
	`35`	`+Returns the current similarity threshold used by the '%'`
	`36`	`+operator. This in effect sets the minimum similarity between`
	`37`	`+two words in order that they be considered similar enough to`
	`38`	`+be misspellings of each other, for example.`
	`39`	`+`
	`40`	`+real set_limit(real)`
	`41`	`+`
	`42`	`+Sets the current similarity threshold that is used by the '%'`
	`43`	`+operator, and is returned by the show_limit() function.`
	`44`	`+`
	`45`	`+text[] show_trgm(text)`
	`46`	`+`
	`47`	`+Returns an array of all the trigrams of the supplied text`
	`48`	`+parameter.`
	`49`	`+`
	`50`	`+Public Operators`
	`51`	`+`
	`52`	`+text % text (returns boolean)`
	`53`	`+`
	`54`	`+The '%' operator returns TRUE if its two arguments have a similarity`
	`55`	`+that is greater than the similarity threshold set by set_limit(). It`
	`56`	`+will return FALSE if the similarity is less than the current`
	`57`	`+threshold.`
	`58`	`+`
	`59`	`+Public Index Operator Classes`
	`60`	`+`
	`61`	`+gist_trgm_ops`
	`62`	`+`
	`63`	`+The pg_trgm module comes with an index operator class that allows a`
	`64`	`+developer to create an index over a text column for the purpose`
	`65`	`+of very fast similarity searches.`
	`66`	`+`
	`67`	`+To use this index, the '%' operator must be used and an appropriate`
	`68`	`+similarity threshold for the application must be set.`
	`69`	`+`
	`70`	`+eg.`
	`71`	`+`
	`72`	`+CREATE TABLE test_trgm (t text);`
	`73`	`+CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);`
	`74`	`+`
	`75`	`+At this point, you will have an index on the t text column that you`
	`76`	`+can use for similarity searching.`
	`77`	`+`
	`78`	`+eg.`
	`79`	`+`
	`80`	`+SELECT`
	`81`	`+t,`
	`82`	`+similarity(t, 'word') AS sml`
	`83`	`+FROM`
	`84`	`+test_trgm`
	`85`	`+WHERE`
	`86`	`+t % 'word'`
	`87`	`+ORDER BY`
	`88`	`+sml DESC, t;`
	`89`	`+`
	`90`	`+This will return all values in the text column that are sufficiently`
	`91`	`+similar to 'word', sorted from best match to worst. The index will`
	`92`	`+be used to make this a fast operation over very large data sets.`
	`93`	`+`
	`94`	`+Tsearch2 Integration`
	`95`	`+`
	`96`	`+Trigram matching is a very useful tool when used in conjunction`
	`97`	`+with a text index created by the Tsearch2 contrib module. (See`
	`98`	`+contrib/tsearch2)`
	`99`	`+`
	`100`	`+The first step is to generate an auxiliary table containing all`
	`101`	`+the unique words in the Tsearch2 index:`
	`102`	`+`
	`103`	`+CREATE TABLE words AS`
	`104`	`+SELECT word FROM stat('SELECT vector FROM documents');`
	`105`	`+`
	`106`	`+Where 'documents' is the table that contains the Tsearch2 index`
	`107`	`+column 'vector', of type 'tsvector'.`
	`108`	`+`
	`109`	`+Next, create a trigram index on the word column:`
	`110`	`+`
	`111`	`+CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);`
	`112`	`+`
	`113`	`+Now, a SELECT query similar to the example above can be used to`
	`114`	`+suggest spellings for misspelled words in user search terms. A`
	`115`	`+useful extra clause is to ensure that the similar words are also`
	`116`	`+of similar length to the misspelled word.`
	`117`	`+`
	`118`	`+Note: Since the 'words' table has been generated as a separate,`
	`119`	`+static table, it will need to be periodically regenerated so that`
	`120`	`+it remains up to date with the word list in the Tsearch2 index.`
	`121`	`+`
	`122`	`+Authors`
	`123`	`+`
	`124`	`+Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia`
	`125`	`+Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia`
	`126`	`+`
	`127`	`+Contributors`
	`128`	`+`
	`129`	`+Christopher Kings-Lynne wrote this README file`
	`130`	`+`
	`131`	`+References`
	`132`	`+`
	`133`	`+Tsearch2 Development Site`
	`134`	`+http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/`
	`135`	`+`
	`136`	`+GiST Development Site`
	`137`	`+http://www.sai.msu.su/~megera/postgres/gist/`
	`138`	`+`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitcbfa409

File tree

9 files changed

9 files changed

`‎contrib/pg_trgm/Makefile`

`‎contrib/pg_trgm/README.pg_trgm`

0 commit comments