Movatterモバイル変換


[0]ホーム

URL:



Facebook
Postgres Pro
Facebook
Downloads

8.11. Text Search Types

PostgreSQL provides two data types that are designed to support full text search, which is the activity of searching through a collection of natural-languagedocuments to locate those that best match aquery. Thetsvector type represents a document in a form optimized for text search; thetsquery type similarly represents a text query.Chapter 12 provides a detailed explanation of this facility, andSection 9.13 summarizes the related functions and operators.

8.11.1.tsvector

Atsvector value is a sorted list of distinctlexemes, which are words that have beennormalized to merge different variants of the same word (seeChapter 12 for details). Sorting and duplicate-elimination are done automatically during input, as shown in this example:

SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;                      tsvector---------------------------------------------------- 'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'

To represent lexemes containing whitespace or punctuation, surround them with quotes:

SELECT $$the lexeme '    ' contains spaces$$::tsvector;                 tsvector                  ------------------------------------------- '    ' 'contains' 'lexeme' 'spaces' 'the'

(We use dollar-quoted string literals in this example and the next one to avoid the confusion of having to double quote marks within the literals.) Embedded quotes and backslashes must be doubled:

SELECT $$the lexeme 'Joe''s' contains a quote$$::tsvector;                    tsvector                    ------------------------------------------------ 'Joe''s' 'a' 'contains' 'lexeme' 'quote' 'the'

Optionally, integerpositions can be attached to lexemes:

SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector;                                  tsvector------------------------------------------------------------------------------- 'a':1,6,10 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'on':5 'rat':12 'sat':4

A position normally indicates the source word's location in the document. Positional information can be used forproximity ranking. Position values can range from 1 to 16383; larger numbers are silently set to 16383. Duplicate positions for the same lexeme are discarded.

Lexemes that have positions can further be labeled with aweight, which can beA,B,C, orD.D is the default and hence is not shown on output:

SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;          tsvector          ---------------------------- 'a':1A 'cat':5 'fat':2B,4C

Weights are typically used to reflect document structure, for example by marking title words differently from body words. Text search ranking functions can assign different priorities to the different weight markers.

It is important to understand that thetsvector type itself does not perform any normalization; it assumes the words it is given are normalized appropriately for the application. For example,

select 'The Fat Rats'::tsvector;      tsvector      -------------------- 'Fat' 'Rats' 'The'

For most English-text-searching applications the above words would be considered non-normalized, buttsvector doesn't care. Raw document text should usually be passed throughto_tsvector to normalize the words appropriately for searching:

SELECT to_tsvector('english', 'The Fat Rats');   to_tsvector   ----------------- 'fat':2 'rat':3

Again, seeChapter 12 for more detail.

8.11.2.tsquery

Atsquery value stores lexemes that are to be searched for, and combines them honoring the Boolean operators& (AND),| (OR), and! (NOT). Parentheses can be used to enforce grouping of the operators:

SELECT 'fat & rat'::tsquery;    tsquery    --------------- 'fat' & 'rat'SELECT 'fat & (rat | cat)'::tsquery;          tsquery          --------------------------- 'fat' & ( 'rat' | 'cat' )SELECT 'fat & rat & ! cat'::tsquery;        tsquery         ------------------------ 'fat' & 'rat' & !'cat'

In the absence of parentheses,! (NOT) binds most tightly, and& (AND) binds more tightly than| (OR).

Optionally, lexemes in atsquery can be labeled with one or more weight letters, which restricts them to match onlytsvector lexemes with matching weights:

SELECT 'fat:ab & cat'::tsquery;    tsquery------------------ 'fat':AB & 'cat'

Also, lexemes in atsquery can be labeled with* to specify prefix matching:

SELECT 'super:*'::tsquery;  tsquery  ----------- 'super':*

This query will match any word in atsvector that begins with"super". Note that prefixes are first processed by text search configurations, which means this comparison returns true:

SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' ); ?column? ---------- t(1 row)

becausepostgres gets stemmed topostgr:

SELECT to_tsquery('postgres:*'); to_tsquery ------------ 'postgr':*(1 row)

which then matchespostgraduate.

Quoting rules for lexemes are the same as described previously for lexemes intsvector; and, as withtsvector, any required normalization of words must be done before converting to thetsquery type. Theto_tsquery function is convenient for performing such normalization:

SELECT to_tsquery('Fat:ab & Cats');    to_tsquery    ------------------ 'fat':AB & 'cat'


PrevHomeNext
Bit String TypesUpUUID Type
Go to PostgreSQL 9.4
By continuing to browse this website, you agree to the use of cookies. Go toPrivacy Policy.

[8]ページ先頭

©2009-2025 Movatter.jp