NotificationsYou must be signed in to change notification settings
Fork28
Star153

Commit3e17ef1

committed

Adjust ts_debug's output as per my proposal of yesterday: show the

active dictionary and its output lexemes as separate columns, insteadof smashing them into one text column, and lowercase the column names.Also, define the output rowtype using OUT parameters instead of acomposite type, to be consistent with the other built-in functions.

1 parent7ec280e commit3e17ef1Copy full SHA for 3e17ef1

File tree

4 files changed

+153

-123

lines changed

doc/src/sgml
- func.sgml
- textsearch.sgml
src
- backend/catalog
  - system_views.sql
- include/catalog
  - catversion.h

4 files changed

+153

-123

lines changed

`‎doc/src/sgml/func.sgml‎`

Lines changed: 4 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.402 2007/10/21 20:04:37 tgl Exp $ -->`
	`1`	`+<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.403 2007/10/22 20:13:37 tgl Exp $ -->`
`2`	`2`
`3`	`3`	`<chapter id="functions">`
`4`	`4`	`<title>Functions and Operators</title>`
`@@ -7857,11 +7857,11 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple`
`7857`	`7857`	`</thead>`
`7858`	`7858`	`<tbody>`
`7859`	`7859`	`<row>`
`7860`		`- <entry><literal><function>ts_debug</function>(<optional> <replaceable class="PARAMETER">config</replaceable> <type>regconfig</>, </optional> <replaceable class="PARAMETER">document</replaceable> <type>text</>)</literal></entry>`
`7861`		`- <entry><type>setofts_debug</type></entry>`
	`7860`	+ <entry><literal><function>ts_debug</function>(<optional> <replaceable class="PARAMETER">config</replaceable> <type>regconfig</>, </optional> <replaceable class="PARAMETER">document</replaceable> <type>text</>, OUT <replaceable class="PARAMETER">alias</> <type>text</>, OUT <replaceable class="PARAMETER">description</> <type>text</>, OUT <replaceable class="PARAMETER">token</> <type>text</>, OUT <replaceable class="PARAMETER">dictionaries</> <type>regdictionary[]</>, OUT <replaceable class="PARAMETER">dictionary</> <type>regdictionary</>, OUT <replaceable class="PARAMETER">lexemes</> <type>text[]</>)</literal></entry>
	`7861`	`+ <entry><type>setofrecord</type></entry>`
`7862`	`7862`	`<entry>test a configuration</entry>`
`7863`	`7863`	`<entry><literal>ts_debug('english', 'The Brightest supernovaes')</literal></entry>`
`7864`		`- <entry><literal>(lword,"Latin word",The,{english_stem},"english_stem: {}") ...</literal></entry>`
	`7864`	`+ <entry><literal>(lword,"Latin word",The,{english_stem},english_stem,{}) ...</literal></entry>`
`7865`	`7865`	`</row>`
`7866`	`7866`	`<row>`
`7867`	`7867`	`<entry><literal><function>ts_lexize</function>(<replaceable class="PARAMETER">dict</replaceable> <type>regdictionary</>, <replaceable class="PARAMETER">token</replaceable> <type>text</>)</literal></entry>`

`‎doc/src/sgml/textsearch.sgml‎`

Lines changed: 113 additions & 87 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.22 2007/10/2203:37:04 tgl Exp $ -->`
	`1`	`+<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.23 2007/10/2220:13:37 tgl Exp $ -->`
`2`	`2`
`3`	`3`	`<chapter id="textsearch">`
`4`	`4`	`<title id="textsearch-title">Full Text Search</title>`
`@@ -1699,18 +1699,18 @@ ON messages FOR EACH ROW EXECUTE PROCEDURE messages_trigger();`
`1699`	`1699`	`<itemizedlist spacing="compact" mark="bullet">`
`1700`	`1700`	`<listitem>`
`1701`	`1701`	`<para>`
`1702`		`- <structname>word</> <type>text</> — the value of a lexeme`
	`1702`	`+ <replaceable>word</> <type>text</> — the value of a lexeme`
`1703`	`1703`	`</para>`
`1704`	`1704`	`</listitem>`
`1705`	`1705`	`<listitem>`
`1706`	`1706`	`<para>`
`1707`		`- <structname>ndoc</> <type>integer</> — number of documents`
	`1707`	`+ <replaceable>ndoc</> <type>integer</> — number of documents`
`1708`	`1708`	`(<type>tsvector</>s) the word occurred in`
`1709`	`1709`	`</para>`
`1710`	`1710`	`</listitem>`
`1711`	`1711`	`<listitem>`
`1712`	`1712`	`<para>`
`1713`		`- <structname>nentry</> <type>integer</> — total number of`
	`1713`	`+ <replaceable>nentry</> <type>integer</> — total number of`
`1714`	`1714`	`occurrences of the word`
`1715`	`1715`	`</para>`
`1716`	`1716`	`</listitem>`
`@@ -1901,8 +1901,8 @@ LIMIT 10;`
`1901`	`1901`	`as the entire word and as each component:`
`1902`	`1902`
`1903`	`1903`	`<programlisting>`
`1904`		`-SELECT"Alias", "Description", "Token" FROM ts_debug('foo-bar-beta1');`
`1905`		`-Alias \|Description \|Token`
	`1904`	`+SELECTalias, description, token FROM ts_debug('foo-bar-beta1');`
	`1905`	`+alias \|description \|token`
`1906`	`1906`	`-------------+-------------------------------+---------------`
`1907`	`1907`	`hword \| Hyphenated word \| foo-bar-beta1`
`1908`	`1908`	`lpart_hword \| Latin part of hyphenated word \| foo`
`@@ -1917,8 +1917,8 @@ SELECT "Alias", "Description", "Token" FROM ts_debug('foo-bar-beta1');`
`1917`	`1917`	`instructive example:`
`1918`	`1918`
`1919`	`1919`	`<programlisting>`
`1920`		`-SELECT"Alias", "Description", "Token" FROM ts_debug('http://foo.com/stuff/index.html');`
`1921`		`-Alias \|Description \|Token`
	`1920`	`+SELECTalias, description, token FROM ts_debug('http://foo.com/stuff/index.html');`
	`1921`	`+alias \|description \|token`
`1922`	`1922`	`----------+---------------+--------------------------`
`1923`	`1923`	`protocol \| Protocol head \| http://`
`1924`	`1924`	`url \| URL \| foo.com/stuff/index.html`
`@@ -2186,25 +2186,23 @@ SELECT ts_lexize('public.simple_dict','The');`
`2186`	`2186`	`synonym dictionary and put it before the <literal>english_stem</> dictionary:`
`2187`	`2187`
`2188`	`2188`	`<programlisting>`
`2189`		`-SELECT * FROM ts_debug('english','Paris');`
`2190`		`- Alias \| Description \| Token \| Dictionaries \| Lexized token`
`2191`		`--------+-------------+-------+----------------+----------------------`
`2192`		`- lword \| Latin word \| Paris \| {english_stem} \| english_stem: {pari}`
`2193`		`-(1 row)`
	`2189`	`+SELECT * FROM ts_debug('english', 'Paris');`
	`2190`	`+ alias \| description \| token \| dictionaries \| dictionary \| lexemes`
	`2191`	`+-------+-------------+-------+----------------+--------------+---------`
	`2192`	`+ lword \| Latin word \| Paris \| {english_stem} \| english_stem \| {pari}`
`2194`	`2193`
`2195`		`-CREATE TEXT SEARCH DICTIONARYsynonym (`
	`2194`	`+CREATE TEXT SEARCH DICTIONARYmy_synonym (`
`2196`	`2195`	`TEMPLATE = synonym,`
`2197`	`2196`	`SYNONYMS = my_synonyms`
`2198`	`2197`	`);`
`2199`	`2198`
`2200`	`2199`	`ALTER TEXT SEARCH CONFIGURATION english`
`2201`		`- ALTER MAPPING FOR lword WITHsynonym, english_stem;`
	`2200`	`+ ALTER MAPPING FOR lword WITHmy_synonym, english_stem;`
`2202`	`2201`
`2203`		`-SELECT * FROM ts_debug('english','Paris');`
`2204`		`- Alias \| Description \| Token \| Dictionaries \| Lexized token`
`2205`		`--------+-------------+-------+------------------------+------------------`
`2206`		`- lword \| Latin word \| Paris \| {synonym,english_stem} \| synonym: {paris}`
`2207`		`-(1 row)`
	`2202`	`+SELECT * FROM ts_debug('english', 'Paris');`
	`2203`	`+ alias \| description \| token \| dictionaries \| dictionary \| lexemes`
	`2204`	`+-------+-------------+-------+---------------------------+------------+---------`
	`2205`	`+ lword \| Latin word \| Paris \| {my_synonym,english_stem} \| my_synonym \| {paris}`
`2208`	`2206`	`</programlisting>`
`2209`	`2207`	`</para>`
`2210`	`2208`
`@@ -2711,7 +2709,14 @@ SHOW default_text_search_config;`
`2711`	`2709`	`</indexterm>`
`2712`	`2710`
`2713`	`2711`	`<synopsis>`
`2714`		`- ts_debug(<optional> <replaceable class="PARAMETER">config</replaceable> <type>regconfig</>, </optional> <replaceable class="PARAMETER">document</replaceable> <type>text</>) returns <type>setof ts_debug</>`
	`2712`	`+ ts_debug(<optional> <replaceable class="PARAMETER">config</replaceable> <type>regconfig</>, </optional> <replaceable class="PARAMETER">document</replaceable> <type>text</>,`
	`2713`	`+ OUT <replaceable class="PARAMETER">alias</> <type>text</>,`
	`2714`	`+ OUT <replaceable class="PARAMETER">description</> <type>text</>,`
	`2715`	`+ OUT <replaceable class="PARAMETER">token</> <type>text</>,`
	`2716`	`+ OUT <replaceable class="PARAMETER">dictionaries</> <type>regdictionary[]</>,`
	`2717`	`+ OUT <replaceable class="PARAMETER">dictionary</> <type>regdictionary</>,`
	`2718`	`+ OUT <replaceable class="PARAMETER">lexemes</> <type>text[]</>)`
	`2719`	`+ returns setof record`
`2715`	`2720`	`</synopsis>`
`2716`	`2721`
`2717`	`2722`	`<para>`
`@@ -2725,57 +2730,80 @@ SHOW default_text_search_config;`
`2725`	`2730`	`</para>`
`2726`	`2731`
`2727`	`2732`	`<para>`
`2728`		`- <function>ts_debug</>'s result row type is defined as:`
	`2733`	`+ <function>ts_debug</> returns one row for each token identified in the text`
	`2734`	`+ by the parser. The columns returned are`
`2729`	`2735`
`2730`		`-<programlisting>`
`2731`		`-CREATE TYPE ts_debug AS (`
`2732`		`- "Alias" text,`
`2733`		`- "Description" text,`
`2734`		`- "Token" text,`
`2735`		`- "Dictionaries" regdictionary[],`
`2736`		`- "Lexized token" text`
`2737`		`-);`
`2738`		`-</programlisting>`
`2739`		`-`
`2740`		`- One row is produced for each token identified by the parser.`
`2741`		`- The first three columns describe the token, and the fourth lists`
`2742`		`- the dictionaries selected by the configuration for that token's type.`
`2743`		`- The last column shows the result of dictionary processing: which`
`2744`		`- dictionary (if any) recognized the token, and what it produced.`
	`2736`	`+ <itemizedlist spacing="compact" mark="bullet">`
	`2737`	`+ <listitem>`
	`2738`	`+ <para>`
	`2739`	`+ <replaceable>alias</> <type>text</> — short name of the token type`
	`2740`	`+ </para>`
	`2741`	`+ </listitem>`
	`2742`	`+ <listitem>`
	`2743`	`+ <para>`
	`2744`	`+ <replaceable>description</> <type>text</> — description of the`
	`2745`	`+ token type`
	`2746`	`+ </para>`
	`2747`	`+ </listitem>`
	`2748`	`+ <listitem>`
	`2749`	`+ <para>`
	`2750`	`+ <replaceable>token</> <type>text</> — text of the token`
	`2751`	`+ </para>`
	`2752`	`+ </listitem>`
	`2753`	`+ <listitem>`
	`2754`	`+ <para>`
	`2755`	`+ <replaceable>dictionaries</> <type>regdictionary[]</> — the`
	`2756`	`+ dictionaries selected by the configuration for this token type`
	`2757`	`+ </para>`
	`2758`	`+ </listitem>`
	`2759`	`+ <listitem>`
	`2760`	`+ <para>`
	`2761`	`+ <replaceable>dictionary</> <type>regdictionary</> — the dictionary`
	`2762`	`+ that recognized the token, or <literal>NULL</> if none did`
	`2763`	`+ </para>`
	`2764`	`+ </listitem>`
	`2765`	`+ <listitem>`
	`2766`	`+ <para>`
	`2767`	`+ <replaceable>lexemes</> <type>text[]</> — the lexeme(s) produced`
	`2768`	`+ by the dictionary that recognized the token, or <literal>NULL</> if`
	`2769`	`+ none did; an empty array (<literal>{}</>) means it was recognized as a`
	`2770`	`+ stop word`
	`2771`	`+ </para>`
	`2772`	`+ </listitem>`
	`2773`	`+ </itemizedlist>`
`2745`	`2774`	`</para>`
`2746`	`2775`
`2747`	`2776`	`<para>`
`2748`	`2777`	`Here is a simple example:`
`2749`	`2778`
`2750`	`2779`	`<programlisting>`
`2751`	`2780`	`SELECT * FROM ts_debug('english','a fat cat sat on a mat - it ate a fat rats');`
`2752`		`- Alias \| Description \| Token \| Dictionaries \| Lexized token`
`2753`		`--------+---------------+-------+--------------+----------------`
`2754`		`- lword \| Latin word \| a \| {english} \| english: {}`
`2755`		`- blank \| Space symbols \| \| \|`
`2756`		`- lword \| Latin word \| fat \| {english} \| english: {fat}`
`2757`		`- blank \| Space symbols \| \| \|`
`2758`		`- lword \| Latin word \| cat \| {english} \| english: {cat}`
`2759`		`- blank \| Space symbols \| \| \|`
`2760`		`- lword \| Latin word \| sat \| {english} \| english: {sat}`
`2761`		`- blank \| Space symbols \| \| \|`
`2762`		`- lword \| Latin word \| on \| {english} \| english: {}`
`2763`		`- blank \| Space symbols \| \| \|`
`2764`		`- lword \| Latin word \| a \| {english} \| english: {}`
`2765`		`- blank \| Space symbols \| \| \|`
`2766`		`- lword \| Latin word \| mat \| {english} \| english: {mat}`
`2767`		`- blank \| Space symbols \| \| \|`
`2768`		`- blank \| Space symbols \| - \| \|`
`2769`		`- lword \| Latin word \| it \| {english} \| english: {}`
`2770`		`- blank \| Space symbols \| \| \|`
`2771`		`- lword \| Latin word \| ate \| {english} \| english: {ate}`
`2772`		`- blank \| Space symbols \| \| \|`
`2773`		`- lword \| Latin word \| a \| {english} \| english: {}`
`2774`		`- blank \| Space symbols \| \| \|`
`2775`		`- lword \| Latin word \| fat \| {english} \| english: {fat}`
`2776`		`- blank \| Space symbols \| \| \|`
`2777`		`- lword \| Latin word \| rats \| {english} \| english: {rat}`
`2778`		`- (24 rows)`
	`2781`	`+ alias \| description \| token \| dictionaries \| dictionary \| lexemes`
	`2782`	`+-------+---------------+-------+----------------+--------------+---------`
	`2783`	`+ lword \| Latin word \| a \| {english_stem} \| english_stem \| {}`
	`2784`	`+ blank \| Space symbols \| \| {} \| \|`
	`2785`	`+ lword \| Latin word \| fat \| {english_stem} \| english_stem \| {fat}`
	`2786`	`+ blank \| Space symbols \| \| {} \| \|`
	`2787`	`+ lword \| Latin word \| cat \| {english_stem} \| english_stem \| {cat}`
	`2788`	`+ blank \| Space symbols \| \| {} \| \|`
	`2789`	`+ lword \| Latin word \| sat \| {english_stem} \| english_stem \| {sat}`
	`2790`	`+ blank \| Space symbols \| \| {} \| \|`
	`2791`	`+ lword \| Latin word \| on \| {english_stem} \| english_stem \| {}`
	`2792`	`+ blank \| Space symbols \| \| {} \| \|`
	`2793`	`+ lword \| Latin word \| a \| {english_stem} \| english_stem \| {}`
	`2794`	`+ blank \| Space symbols \| \| {} \| \|`
	`2795`	`+ lword \| Latin word \| mat \| {english_stem} \| english_stem \| {mat}`
	`2796`	`+ blank \| Space symbols \| \| {} \| \|`
	`2797`	`+ blank \| Space symbols \| - \| {} \| \|`
	`2798`	`+ lword \| Latin word \| it \| {english_stem} \| english_stem \| {}`
	`2799`	`+ blank \| Space symbols \| \| {} \| \|`
	`2800`	`+ lword \| Latin word \| ate \| {english_stem} \| english_stem \| {ate}`
	`2801`	`+ blank \| Space symbols \| \| {} \| \|`
	`2802`	`+ lword \| Latin word \| a \| {english_stem} \| english_stem \| {}`
	`2803`	`+ blank \| Space symbols \| \| {} \| \|`
	`2804`	`+ lword \| Latin word \| fat \| {english_stem} \| english_stem \| {fat}`
	`2805`	`+ blank \| Space symbols \| \| {} \| \|`
	`2806`	`+ lword \| Latin word \| rats \| {english_stem} \| english_stem \| {rat}`
`2779`	`2807`	`</programlisting>`
`2780`	`2808`	`</para>`
`2781`	`2809`
`@@ -2801,34 +2829,33 @@ ALTER TEXT SEARCH CONFIGURATION public.english`
`2801`	`2829`
`2802`	`2830`	`<programlisting>`
`2803`	`2831`	`SELECT * FROM ts_debug('public.english','The Brightest supernovaes');`
`2804`		`- Alias \| Description \| Token \| Dictionaries \| Lexized token`
`2805`		`--------+---------------+-------------+-------------------------------------------------+-------------------------------------`
`2806`		`- lword \| Latin word \| The \| {public.english_ispell,pg_catalog.english_stem} \| public.english_ispell: {}`
`2807`		`- blank \| Space symbols \| \| \|`
`2808`		`- lword \| Latin word \| Brightest \| {public.english_ispell,pg_catalog.english_stem} \| public.english_ispell: {bright}`
`2809`		`- blank \| Space symbols \| \| \|`
`2810`		`- lword \| Latin word \| supernovaes \| {public.english_ispell,pg_catalog.english_stem} \| pg_catalog.english_stem: {supernova}`
`2811`		`-(5 rows)`
	`2832`	`+ alias \| description \| token \| dictionaries \| dictionary \| lexemes`
	`2833`	`+-------+---------------+-------------+-------------------------------+----------------+-------------`
	`2834`	`+ lword \| Latin word \| The \| {english_ispell,english_stem} \| english_ispell \| {}`
	`2835`	`+ blank \| Space symbols \| \| {} \| \|`
	`2836`	`+ lword \| Latin word \| Brightest \| {english_ispell,english_stem} \| english_ispell \| {bright}`
	`2837`	`+ blank \| Space symbols \| \| {} \| \|`
	`2838`	`+ lword \| Latin word \| supernovaes \| {english_ispell,english_stem} \| english_stem \| {supernova}`
`2812`	`2839`	`</programlisting>`
`2813`	`2840`
`2814`	`2841`	`<para>`
`2815`	`2842`	`In this example, the word <literal>Brightest</> was recognized by the`
`2816`	`2843`	`parser as a <literal>Latin word</literal> (alias <literal>lword</literal>).`
`2817`	`2844`	`For this token type the dictionary list is`
`2818`		`- <literal>public.english_ispell</> and`
`2819`		`- <literal>pg_catalog.english_stem</literal>. The word was recognized by`
`2820`		`- <literal>public.english_ispell</literal>, which reduced it to the noun`
	`2845`	`+ <literal>english_ispell</> and`
	`2846`	`+ <literal>english_stem</literal>. The word was recognized by`
	`2847`	`+ <literal>english_ispell</literal>, which reduced it to the noun`
`2821`	`2848`	`<literal>bright</literal>. The word <literal>supernovaes</literal> is`
`2822`		`- unknown to the <literal>public.english_ispell</literal> dictionary so it`
	`2849`	`+ unknown to the <literal>english_ispell</literal> dictionary so it`
`2823`	`2850`	`was passed to the next dictionary, and, fortunately, was recognized (in`
`2824`		`- fact, <literal>public.english_stem</literal> is a Snowball dictionary which`
	`2851`	`+ fact, <literal>english_stem</literal> is a Snowball dictionary which`
`2825`	`2852`	`recognizes everything; that is why it was placed at the end of the`
`2826`	`2853`	`dictionary list).`
`2827`	`2854`	`</para>`
`2828`	`2855`
`2829`	`2856`	`<para>`
`2830`	`2857`	`The word <literal>The</literal> was recognized by the`
`2831`		`- <literal>public.english_ispell</literal> dictionary as a stop word (<xref`
	`2858`	`+ <literal>english_ispell</literal> dictionary as a stop word (<xref`
`2832`	`2859`	`linkend="textsearch-stopwords">) and will not be indexed.`
`2833`	`2860`	`The spaces are discarded too, since the configuration provides no`
`2834`	`2861`	`dictionaries at all for them.`
`@@ -2839,16 +2866,15 @@ SELECT * FROM ts_debug('public.english','The Brightest supernovaes');`
`2839`	`2866`	`you want to see:`
`2840`	`2867`
`2841`	`2868`	`<programlisting>`
`2842`		`-SELECT"Alias", "Token", "Lexized token"`
	`2869`	`+SELECTalias, token, dictionary, lexemes`
`2843`	`2870`	`FROM ts_debug('public.english','The Brightest supernovaes');`
`2844`		`- Alias \| Token \| Lexized token`
`2845`		`--------+-------------+--------------------------------------`
`2846`		`- lword \| The \| public.english_ispell: {}`
`2847`		`- blank \| \|`
`2848`		`- lword \| Brightest \| public.english_ispell: {bright}`
`2849`		`- blank \| \|`
`2850`		`- lword \| supernovaes \| pg_catalog.english_stem: {supernova}`
`2851`		`-(5 rows)`
	`2871`	`+ alias \| token \| dictionary \| lexemes`
	`2872`	`+-------+-------------+----------------+-------------`
	`2873`	`+ lword \| The \| english_ispell \| {}`
	`2874`	`+ blank \| \| \|`
	`2875`	`+ lword \| Brightest \| english_ispell \| {bright}`
	`2876`	`+ blank \| \| \|`
	`2877`	`+ lword \| supernovaes \| english_stem \| {supernova}`
`2852`	`2878`	`</programlisting>`
`2853`	`2879`	`</para>`
`2854`	`2880`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit3e17ef1

File tree

4 files changed

4 files changed

`‎doc/src/sgml/func.sgml‎`

`‎doc/src/sgml/textsearch.sgml‎`

0 commit comments