11<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
22< html >
33< head >
4- < link type ="text/css "rel ="stylesheet "href ="/~megera/postgres/gist/tsearch/tsearch.css ">
54< title > tsearch2 guide</ title >
65</ head >
76< body >
87< h1 align =center > The tsearch2 Guide</ h1 >
98
109< p align =center >
1110Brandon Craig Rhodes< br > 30 June 2003
11+ < br > Updated to 8.2 release by Oleg Bartunov, October 2006</ br >
1212< p >
1313This Guide introduces the reader to the PostgreSQL tsearch2 module,
1414version 2.
1515More formal descriptions of the module's types and functions
1616are provided in the< a href ="tsearch2-ref.html "> tsearch2 Reference</ a > ,
1717which is a companion to this document.
18- You can retrieve a beta copy of the tsearch2 module from the
19- < a href ="http://www.sai.msu.su/~megera/postgres/gist/ "> GiST for PostgreSQL</ a >
20- page — look under the section entitled< i > Development History</ i >
21- for the current version.
2218< p >
2319First we will examine the< tt > tsvector</ tt > and< tt > tsquery</ tt > types
2420and how they are used to search documents;
@@ -32,15 +28,40 @@ <h1 align=center>The tsearch2 Guide</h1>
3228< hr >
3329< h2 > Table of Contents</ h2 >
3430< blockquote >
31+ < a href ="#intro "> Introduction to FTS with tsearch2</ a > < br >
3532< a href ="#vectors_queries "> Vectors and Queries</ a > < br >
3633< a href ="#simple_search "> A Simple Search Engine</ a > < br >
3734< a href ="#weights "> Ranking and Position Weights</ a > < br >
3835< a href ="#casting "> Casting Vectors and Queries</ a > < br >
3936< a href ="#parsing_lexing "> Parsing and Lexing</ a > < br >
37+ < a href ="#ref "> Additional information</ a >
4038</ blockquote >
4139
4240< hr >
4341
42+
43+ < h2 > < a name ="intro "> Introduction to FTS with tsearch2</ a > </ h2 >
44+ The purpose of FTS is to
45+ find< b > documents</ b > , which satisfy< b > query</ b > and optionally return
46+ them in some< b > order</ b > .
47+ Most common case: Find documents containing all query terms and return them in order
48+ of their similarity to the query. Document in database can be
49+ any text attribute, or combination of text attributes from one or many tables
50+ (using joins).
51+ Text search operators existed for years, in PostgreSQL they are
52+ < tt > < b > ~,~*, LIKE, ILIKE</ b > </ tt > , but they lack linguistic support,
53+ tends to be slow and have no relevance ranking. The idea behind tsearch2 is
54+ is rather simple - preprocess document at index time to save time at search stage.
55+ Preprocessing includes
56+ < ul >
57+ < li > document parsing onto words
58+ < li > linguistic - normalize words to obtain lexemes
59+ < li > store document in optimized for searching way
60+ </ ul >
61+ Tsearch2, in a nutshell, provides FTS operator (contains) for two new data types,
62+ which represent document and query -< tt > tsquery @@ tsvector</ tt > .
63+
64+ < P >
4465< h2 > < a name =vectors_queries > Vectors and Queries</ a > </ h2 >
4566
4667< blockquote >
@@ -79,6 +100,8 @@ <h2><a name=vectors_queries>Vectors and Queries</a></h2>
79100 on the< tt > tsvector</ tt > column of a table,
80101 which implements a form of the Berkeley
81102< a href ="http://gist.cs.berkeley.edu/ "> < i > Generalized Search Tree</ i > </ a > .
103+ Since PostgreSQL 8.2 tsearch2 supports< a href ="http://www.sigaev.ru/gin/ "> Gin</ a > index,
104+ which is an inverted index, commonly used in search engines. It adds scalability to tsearch2.
82105</ ul >
83106Once your documents are indexed,
84107performing a search involves:
@@ -251,7 +274,7 @@ <h2><a name=vectors_queries>Vectors and Queries</a></h2>
251274
252275< pre >
253276=#< b > SELECT to_tsquery('the')</ b >
254- NOTICE: Query contains only stopword(s) or doesn't containlexeme (s), ignored
277+ NOTICE: Query contains only stopword(s) or doesn't containlexem (s), ignored
255278 to_tsquery
256279------------
257280
@@ -483,8 +506,8 @@ <h2><a name=weights>Ranking and Position Weights</a></h2>
483506and has the feature that you can assign different weights
484507to words from different sections of your document.
485508The< tt > rank_cd()</ tt > uses a recent technique for weighting results
486- but does not allow different weight to be given
487- to different sections of your document.
509+ and also allows different weight to be given
510+ to different sections of your document (since 8.2) .
488511< p >
489512Both ranking functions allow you to specify,
490513as an optional last argument,
@@ -511,9 +534,6 @@ <h2><a name=weights>Ranking and Position Weights</a></h2>
511534see the< a href ="tsearch2-ref.html#ranking "> section on ranking</ a >
512535in the Reference.
513536< p >
514- The< tt > rank()</ tt > function offers more flexibility
515- because it pays attention to the< i > weights</ i >
516- with which you have labelled lexeme positions.
517537Currently tsearch2 supports four different weight labels:
518538< tt > 'D'</ tt > , the default weight;
519539and< tt > 'A'</ tt > ,< tt > 'B'</ tt > , and< tt > 'C'</ tt > .
@@ -730,7 +750,7 @@ <h2><a name=casting>Casting Vectors and Queries</a></h2>
730750are important< i > both</ i > to PostgreSQL when it is interpreting a string,
731751< i > and</ i > to the< tt > tsvector</ tt > conversion function.
732752You may want to review section
733- < a href ="http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file= sql-syntax.html#SQL-SYNTAX-STRINGS "> 1.1.2.1,
753+ < a href ="http://www.postgresql.org/docs/current/static/ sql-syntax.html#SQL-SYNTAX-STRINGS ">
734754“String Constants”</ a >
735755in the PostgreSQL documentation before proceeding.
736756< p >
@@ -1051,6 +1071,14 @@ <h2><a name=parsing_lexing>Parsing and Lexing</a></h2>
10511071with the difference that the query parser recognizes as special
10521072the boolean operators that separate query words.
10531073
1074+
1075+ < h2 > < a name ="ref "> Additional information</ a > </ h2 >
1076+ More information about tsearch2 is available from
1077+ < a href ="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 "> tsearch2</ a > page.
1078+ Also, it's worth to check
1079+ < a href ="http://www.sai.msu.su/~megera/wiki/Tsearch2 "> tsearch2 wiki</ a > pages.
1080+
1081+
10541082</ body >
10551083</ html >
10561084