NotificationsYou must be signed in to change notification settings
Fork5
Star27

Commit9389ac8

committed

Document filtering dictionaries in textsearch.sgml.

While at it, copy-edit the description of prefix-match marker support insynonym dictionaries, and clarify the description of the default unaccentdictionary a bit more.

1 parentacac35a commit9389ac8Copy full SHA for 9389ac8

File tree

2 files changed

+79

-55

lines changed

doc/src/sgml
- textsearch.sgml
- unaccent.sgml

2 files changed

+79

-55

lines changed

`‎doc/src/sgml/textsearch.sgml‎`

Lines changed: 74 additions & 52 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.58 2010/08/20 13:59:45 tgl Exp $ -->`
	`1`	`+<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.59 2010/08/25 21:42:55 tgl Exp $ -->`
`2`	`2`
`3`	`3`	`<chapter id="textsearch">`
`4`	`4`	`<title>Full Text Search</title>`
`@@ -112,7 +112,7 @@`
`112`	`112`	`as a sorted array of normalized lexemes. Along with the lexemes it is`
`113`	`113`	`often desirable to store positional information to use for`
`114`	`114`	`<firstterm>proximity ranking</firstterm>, so that a document that`
`115`		`- contains a more <quote>dense</> region of query words is`
	`115`	`+ contains a more <quote>dense</> region of query words is`
`116`	`116`	`assigned a higher rank than one with scattered query words.`
`117`	`117`	`</para>`
`118`	`118`	`</listitem>`
`@@ -1151,13 +1151,13 @@ MaxFragments=0, FragmentDelimiter=" ... "`
`1151`	`1151`	`<screen>`
`1152`	`1152`	`SELECT ts_headline('english',`
`1153`	`1153`	`'The most common type of search`
`1154`		`-is to find all documents containing given query terms`
	`1154`	`+is to find all documents containing given query terms`
`1155`	`1155`	`and return them in order of their similarity to the`
`1156`	`1156`	`query.',`
`1157`	`1157`	`to_tsquery('query & similarity'));`
`1158`	`1158`	`ts_headline`
`1159`	`1159`	`------------------------------------------------------------`
`1160`		`- containing given <b>query</b> terms`
	`1160`	`+ containing given <b>query</b> terms`
`1161`	`1161`	`and return them in order of their <b>similarity</b> to the`
`1162`	`1162`	`<b>query</b>.`
`1163`	`1163`
`@@ -1166,7 +1166,7 @@ SELECT ts_headline('english',`
`1166`	`1166`	`is to find all documents containing given query terms`
`1167`	`1167`	`and return them in order of their similarity to the`
`1168`	`1168`	`query.',`
`1169`		`- to_tsquery('query & similarity'),`
	`1169`	`+ to_tsquery('query & similarity'),`
`1170`	`1170`	`'StartSel = <, StopSel = >');`
`1171`	`1171`	`ts_headline`
`1172`	`1172`	`-------------------------------------------------------`
`@@ -2064,6 +2064,14 @@ SELECT alias, description, token FROM ts_debug('http://example.com/stuff/index.h`
`2064`	`2064`	`(notice that one token can produce more than one lexeme)`
`2065`	`2065`	`</para>`
`2066`	`2066`	`</listitem>`
	`2067`	`+ <listitem>`
	`2068`	`+ <para>`
	`2069`	`+ a single lexeme with the <literal>TSL_FILTER</> flag set, to replace`
	`2070`	`+ the original token with a new token to be passed to subsequent`
	`2071`	`+ dictionaries (a dictionary that does this is called a`
	`2072`	`+ <firstterm>filtering dictionary</>)`
	`2073`	`+ </para>`
	`2074`	`+ </listitem>`
`2067`	`2075`	`<listitem>`
`2068`	`2076`	`<para>`
`2069`	`2077`	`an empty array if the dictionary knows the token, but it is a stop word`
`@@ -2096,6 +2104,13 @@ SELECT alias, description, token FROM ts_debug('http://example.com/stuff/index.h`
`2096`	`2104`	`until some dictionary recognizes it as a known word. If it is identified`
`2097`	`2105`	`as a stop word, or if no dictionary recognizes the token, it will be`
`2098`	`2106`	`discarded and not indexed or searched for.`
	`2107`	`+ Normally, the first dictionary that returns a non-<literal>NULL</>`
	`2108`	`+ output determines the result, and any remaining dictionaries are not`
	`2109`	`+ consulted; but a filtering dictionary can replace the given word`
	`2110`	`+ with a modified word, which is then passed to subsequent dictionaries.`
	`2111`	`+ </para>`
	`2112`	`+`
	`2113`	`+ <para>`
`2099`	`2114`	`The general rule for configuring a list of dictionaries`
`2100`	`2115`	`is to place first the most narrow, most specific dictionary, then the more`
`2101`	`2116`	`general dictionaries, finishing with a very general dictionary, like`
`@@ -2112,6 +2127,16 @@ ALTER TEXT SEARCH CONFIGURATION astro_en`
`2112`	`2127`	`</programlisting>`
`2113`	`2128`	`</para>`
`2114`	`2129`
	`2130`	`+ <para>`
	`2131`	`+ A filtering dictionary can be placed anywhere in the list, except at the`
	`2132`	`+ end where it'd be useless. Filtering dictionaries are useful to partially`
	`2133`	`+ normalize words to simplify the task of later dictionaries. For example,`
	`2134`	`+ a filtering dictionary could be used to remove accents from accented`
	`2135`	`+ letters, as is done by the`
	`2136`	`+ <link linkend="unaccent"><filename>contrib/unaccent</></link>`
	`2137`	`+ extension module.`
	`2138`	`+ </para>`
	`2139`	`+`
`2115`	`2140`	`<sect2 id="textsearch-stopwords">`
`2116`	`2141`	`<title>Stop Words</title>`
`2117`	`2142`
`@@ -2184,7 +2209,7 @@ CREATE TEXT SEARCH DICTIONARY public.simple_dict (`
`2184`	`2209`	`Here, <literal>english</literal> is the base name of a file of stop words.`
`2185`	`2210`	`The file's full name will be`
`2186`	`2211`	`<filename>$SHAREDIR/tsearch_data/english.stop</>,`
`2187`		`- where <literal>$SHAREDIR</> means the`
	`2212`	`+ where <literal>$SHAREDIR</> means the`
`2188`	`2213`	`<productname>PostgreSQL</productname> installation's shared-data directory,`
`2189`	`2214`	`often <filename>/usr/local/share/postgresql</> (use <command>pg_config`
`2190`	`2215`	`--sharedir</> to determine it if you're not sure).`
`@@ -2295,85 +2320,82 @@ SELECT * FROM ts_debug('english', 'Paris');`
`2295`	`2320`	`asciiword \| Word, all ASCII \| Paris \| {my_synonym,english_stem} \| my_synonym \| {paris}`
`2296`	`2321`	`</screen>`
`2297`	`2322`	`</para>`
`2298`		`-`
	`2323`	`+`
`2299`	`2324`	`<para>`
`2300`		`- An asterisk (<literal>*</literal>) at the end of definition word indicates`
`2301`		`- that definition word is a prefix, and <function>to_tsquery()</function>`
`2302`		`- function will transform that definition to the prefix search format (see`
`2303`		`- <xref linkend="textsearch-parsing-queries">).`
`2304`		`- Notice that it is ignored in <function>to_tsvector()</function>.`
	`2325`	`+ The only parameter required by the <literal>synonym</> template is`
	`2326`	`+ <literal>SYNONYMS</>, which is the base name of its configuration file`
	`2327`	`+ — <literal>my_synonyms</> in the above example.`
	`2328`	`+ The file's full name will be`
	`2329`	`+ <filename>$SHAREDIR/tsearch_data/my_synonyms.syn</>`
	`2330`	`+ (where <literal>$SHAREDIR</> means the`
	`2331`	`+ <productname>PostgreSQL</> installation's shared-data directory).`
	`2332`	`+ The file format is just one line`
	`2333`	`+ per word to be substituted, with the word followed by its synonym,`
	`2334`	`+ separated by white space. Blank lines and trailing spaces are ignored.`
	`2335`	`+ </para>`
	`2336`	`+`
	`2337`	`+ <para>`
	`2338`	`+ The <literal>synonym</> template also has an optional parameter`
	`2339`	`+ <literal>CaseSensitive</>, which defaults to <literal>false</>. When`
	`2340`	`+ <literal>CaseSensitive</> is <literal>false</>, words in the synonym file`
	`2341`	`+ are folded to lower case, as are input tokens. When it is`
	`2342`	`+ <literal>true</>, words and tokens are not folded to lower case,`
	`2343`	`+ but are compared as-is.`
`2305`	`2344`	`</para>`
`2306`	`2345`
`2307`	`2346`	`<para>`
`2308`		`- Contents of <filename>$SHAREDIR/tsearch_data/synonym_sample.syn</>:`
	`2347`	`+ An asterisk (<literal>*</literal>) can be placed at the end of a synonym`
	`2348`	`+ in the configuration file. This indicates that the synonym is a prefix.`
	`2349`	`+ The asterisk is ignored when the entry is used in`
	`2350`	`+ <function>to_tsvector()</function>, but when it is used in`
	`2351`	`+ <function>to_tsquery()</function>, the result will be a query item with`
	`2352`	`+ the prefix match marker (see`
	`2353`	`+ <xref linkend="textsearch-parsing-queries">).`
	`2354`	`+ For example, suppose we have these entries in`
	`2355`	`+ <filename>$SHAREDIR/tsearch_data/synonym_sample.syn</>:`
`2309`	`2356`	`<programlisting>`
`2310`	`2357`	`postgres pgsql`
`2311`	`2358`	`postgresql pgsql`
`2312`	`2359`	`postgre pgsql`
`2313`	`2360`	`gogle googl`
`2314`	`2361`	`indices index*`
`2315`	`2362`	`</programlisting>`
`2316`		`- </para>`
`2317`		`-`
`2318`		`- <para>`
`2319`		`- Results:`
	`2363`	`+ Then we will get these results:`
`2320`	`2364`	`<screen>`
`2321`		`-=# CREATE TEXT SEARCH DICTIONARY syn (template=synonym, synonyms='synonym_sample');`
`2322`		`-=# SELECT ts_lexize('syn','indices');`
	`2365`	`+mydb=# CREATE TEXT SEARCH DICTIONARY syn (template=synonym, synonyms='synonym_sample');`
	`2366`	`+mydb=# SELECT ts_lexize('syn','indices');`
`2323`	`2367`	`ts_lexize`
`2324`	`2368`	`-----------`
`2325`	`2369`	`{index}`
`2326`	`2370`	`(1 row)`
`2327`	`2371`
`2328`		`-=# CREATE TEXT SEARCH CONFIGURATION tst (copy=simple);`
`2329`		`-=# ALTER TEXT SEARCH CONFIGURATION tst ALTER MAPPING FOR asciiword WITH syn;`
`2330`		`-=# SELECT to_tsquery('tst','indices');`
	`2372`	`+mydb=# CREATE TEXT SEARCH CONFIGURATION tst (copy=simple);`
	`2373`	`+mydb=# ALTER TEXT SEARCH CONFIGURATION tst ALTER MAPPING FOR asciiword WITH syn;`
	`2374`	`+mydb=# SELECT to_tsvector('tst','indices');`
	`2375`	`+ to_tsvector`
	`2376`	`+-------------`
	`2377`	`+ 'index':1`
	`2378`	`+(1 row)`
	`2379`	`+`
	`2380`	`+mydb=# SELECT to_tsquery('tst','indices');`
`2331`	`2381`	`to_tsquery`
`2332`	`2382`	`------------`
`2333`	`2383`	`'index':*`
`2334`	`2384`	`(1 row)`
`2335`	`2385`
`2336`		`-=# SELECT 'indexes are very useful'::tsvector;`
	`2386`	`+mydb=# SELECT 'indexes are very useful'::tsvector;`
`2337`	`2387`	`tsvector`
`2338`	`2388`	`---------------------------------`
`2339`	`2389`	`'are' 'indexes' 'useful' 'very'`
`2340`	`2390`	`(1 row)`
`2341`	`2391`
`2342`		`-=# SELECT 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices');`
	`2392`	`+mydb=# SELECT 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices');`
`2343`	`2393`	`?column?`
`2344`	`2394`	`----------`
`2345`	`2395`	`t`
`2346`	`2396`	`(1 row)`
`2347`		`-`
`2348`		`-=# SELECT to_tsvector('tst','indices');`
`2349`		`- to_tsvector`
`2350`		`--------------`
`2351`		`- 'index':1`
`2352`		`-(1 row)`
`2353`	`2397`	`</screen>`
`2354`	`2398`	`</para>`
`2355`		`-`
`2356`		`- <para>`
`2357`		`- The only parameter required by the <literal>synonym</> template is`
`2358`		`- <literal>SYNONYMS</>, which is the base name of its configuration file`
`2359`		`- — <literal>my_synonyms</> in the above example.`
`2360`		`- The file's full name will be`
`2361`		`- <filename>$SHAREDIR/tsearch_data/my_synonyms.syn</>`
`2362`		`- (where <literal>$SHAREDIR</> means the`
`2363`		`- <productname>PostgreSQL</> installation's shared-data directory).`
`2364`		`- The file format is just one line`
`2365`		`- per word to be substituted, with the word followed by its synonym,`
`2366`		`- separated by white space. Blank lines and trailing spaces are ignored.`
`2367`		`- </para>`
`2368`		`-`
`2369`		`- <para>`
`2370`		`- The <literal>synonym</> template also has an optional parameter`
`2371`		`- <literal>CaseSensitive</>, which defaults to <literal>false</>. When`
`2372`		`- <literal>CaseSensitive</> is <literal>false</>, words in the synonym file`
`2373`		`- are folded to lower case, as are input tokens. When it is`
`2374`		`- <literal>true</>, words and tokens are not folded to lower case,`
`2375`		`- but are compared as-is.`
`2376`		`- </para>`
`2377`	`2399`	`</sect2>`
`2378`	`2400`
`2379`	`2401`	`<sect2 id="textsearch-thesaurus">`

`‎doc/src/sgml/unaccent.sgml‎`

Lines changed: 5 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-<!-- $PostgreSQL: pgsql/doc/src/sgml/unaccent.sgml,v 1.6 2010/08/2502:12:00 tgl Exp $ -->`
	`1`	`+<!-- $PostgreSQL: pgsql/doc/src/sgml/unaccent.sgml,v 1.7 2010/08/2521:42:55 tgl Exp $ -->`
`2`	`2`
`3`	`3`	`<sect1 id="unaccent">`
`4`	`4`	`<title>unaccent</title>`
`@@ -75,8 +75,10 @@`
`75`	`75`	`<para>`
`76`	`76`	`Running the installation script <filename>unaccent.sql</> creates a text`
`77`	`77`	`search template <literal>unaccent</> and a dictionary <literal>unaccent</>`
`78`		`- based on it, with default parameters. You can alter the`
`79`		`- parameters, for example`
	`78`	`+ based on it. The <literal>unaccent</> dictionary has the default`
	`79`	`+ parameter setting <literal>RULES='unaccent'</>, which makes it immediately`
	`80`	`+ usable with the standard <filename>unaccent.rules</> file.`
	`81`	`+ If you wish, you can alter the parameter, for example`
`80`	`82`
`81`	`83`	`<programlisting>`
`82`	`84`	`mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit9389ac8

File tree

2 files changed

2 files changed

`‎doc/src/sgml/textsearch.sgml‎`

`‎doc/src/sgml/unaccent.sgml‎`

0 commit comments