|
1 | | -<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.31 2007/11/10 15:39:34 momjian Exp $ --> |
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.32 2007/11/14 03:26:24 tgl Exp $ --> |
2 | 2 |
|
3 | 3 | <chapter id="textsearch"> |
4 | 4 | <title id="textsearch-title">Full Text Search</title> |
@@ -3489,99 +3489,77 @@ Parser: "pg_catalog.default" |
3489 | 3489 | <title>Migration from Pre-8.3 Text Search</title> |
3490 | 3490 |
|
3491 | 3491 | <para> |
3492 | | - This area needs lots of work. Here is a quick list of known issues: |
| 3492 | + Applications that used the <filename>contrib/tsearch2</> add-on module |
| 3493 | + for text searching will need some adjustments to work with the |
| 3494 | + built-in features: |
3493 | 3495 | </para> |
3494 | 3496 |
|
3495 | | - <itemizedlist mark="bullet"> |
| 3497 | + <itemizedlist> |
3496 | 3498 | <listitem> |
3497 | 3499 | <para> |
3498 | | - The old contrib/tsearch2 objects <emphasis>must</> be removed from |
3499 | | - the pg_dump output from a pre-8.3 database. While many of them won't |
3500 | | - load for lack of a tsearch2.so library, some do and cause problems. |
3501 | | - We have a working perl script for doing this with a custom- or tar-format |
3502 | | - backup, but there is a proposal to incorporate the functionality directly |
3503 | | - into pg_restore. Neither approach will help for pg_dumpall output. |
| 3500 | + Some functions have been renamed or had small adjustments in their |
| 3501 | + argument lists, and all of them are now in the <literal>pg_catalog</> |
| 3502 | + schema, whereas in a previous installation they would have been in |
| 3503 | + <literal>public</> or another non-system schema. There is a new |
| 3504 | + version of <filename>contrib/tsearch2</> (see <xref linkend="tsearch2">) |
| 3505 | + that provides a compatibility layer to solve most problems in this |
| 3506 | + area. |
3504 | 3507 | </para> |
3505 | 3508 | </listitem> |
3506 | 3509 |
|
3507 | 3510 | <listitem> |
3508 | 3511 | <para> |
3509 | | - The old dump may include schema-qualified references to the old |
3510 | | - contrib/tsearch2 objects; for example <literal>public.tsvector</> |
3511 | | - columns in table definitions. These will fail since the objects |
3512 | | - are now in the pg_catalog schema. Given current pg_dump behavior |
3513 | | - this will happen only for tables that are in a different schema |
3514 | | - from the tsearch2 objects; which makes it more likely to bite |
3515 | | - people who carefully put their tsearch2 objects in a |
3516 | | - non-<literal>public</> schema. |
3517 | | - </para> |
3518 | | - |
3519 | | - <para> |
3520 | | - Question: will restore-time failures of this type happen for |
3521 | | - any objects other than the tsvector and tsquery datatypes? |
3522 | | - </para> |
3523 | | - |
3524 | | - <para> |
3525 | | - The basic alternatives for fixing this seem to involve creating |
3526 | | - a dummy linkage, such as a public.tsvector domain linking to the |
3527 | | - base pg_catalog.tsvector type (which only helps for the datatypes); |
3528 | | - or stripping the schema references out of the dump. We could |
3529 | | - just recommend that users do this manually, or try to provide |
3530 | | - some tools to help. |
3531 | | - </para> |
3532 | | - </listitem> |
3533 | | - |
3534 | | - <listitem> |
3535 | | - <para> |
3536 | | - We have renamed the built-in tsvector update triggers, and changed |
3537 | | - their arguments too. This will result in CREATE TRIGGER commands |
3538 | | - failing during load, which can be ignored, but users will need to |
3539 | | - re-issue them with suitable argument adjustment. We probably |
3540 | | - can't automate that for them. Also, the old tsearch2 trigger |
3541 | | - function offered an option to invoke functions, which was removed |
3542 | | - as being a security hole. Users who were relying on that will need to |
3543 | | - write custom trigger functions as a substitute. I think all we |
3544 | | - can do here is document what to do to fix it. |
| 3512 | + The old <filename>contrib/tsearch2</> functions and other objects |
| 3513 | + <emphasis>must</> be suppressed when loading <application>pg_dump</> |
| 3514 | + output from a pre-8.3 database. While many of them won't load anyway, |
| 3515 | + a few will and then cause problems. One simple way to deal with this |
| 3516 | + is to load the new <filename>contrib/tsearch2</> module before restoring |
| 3517 | + the dump; then it will block the old objects from being loaded. |
3545 | 3518 | </para> |
3546 | 3519 | </listitem> |
3547 | 3520 |
|
3548 | 3521 | <listitem> |
3549 | 3522 | <para> |
3550 | | - We have renamed a number of other functions besides the triggers, |
3551 | | - compared to the tsearch2 versions. This seems unlikely to cause |
3552 | | - any problems during dump/reload but it will require adjustments in |
3553 | | - the bodies of stored procedures and in client application code. |
3554 | | - Again, not much to do except document it. |
| 3523 | + Text search configuration setup is completely different now. |
| 3524 | + Instead of manually inserting rows into configuration tables, |
| 3525 | + search is configured through the specialized SQL commands shown |
| 3526 | + earlier in this chapter. There is not currently any automated |
| 3527 | + support for converting an existing custom configuration for 8.3; |
| 3528 | + you're on your own here. |
3555 | 3529 | </para> |
3556 | 3530 | </listitem> |
3557 | 3531 |
|
3558 | 3532 | <listitem> |
3559 | 3533 | <para> |
3560 | | - Configuration setup is completely different now. Can we provide |
3561 | | - any automated assistance for translating an old custom setup? |
3562 | | - It probably can't be 100% automatic in any case, so maybe documentation |
3563 | | - is the best we can do here too. Aside from the inside-the-database |
3564 | | - differences, outside-the-database configuration files now have |
3565 | | - prescribed location and extensions, which was not true before. |
3566 | | - </para> |
3567 | | - </listitem> |
| 3534 | + Most types of dictionaries rely on some outside-the-database |
| 3535 | + configuration files. These are largely compatible with pre-8.3 |
| 3536 | + usage, but note the following differences: |
3568 | 3537 |
|
3569 | | - <listitem> |
3570 | | - <para> |
3571 | | - Relocation of configuration from add-on tables into core system catalogs |
3572 | | - will break client queries that looked at the add-on tables. |
3573 | | - </para> |
3574 | | - </listitem> |
| 3538 | + <itemizedlist spacing="compact" mark="bullet"> |
| 3539 | + <listitem> |
| 3540 | + <para> |
| 3541 | + Configuration files now must be placed in a single specified |
| 3542 | + directory (<filename>$SHAREDIR/tsearch_data</>), and must have |
| 3543 | + a specific extension depending on the type of file, as noted |
| 3544 | + previously in the descriptions of the various dictionary types. |
| 3545 | + This restriction was added to forestall security problems. |
| 3546 | + </para> |
| 3547 | + </listitem> |
3575 | 3548 |
|
3576 | | - <listitem> |
3577 | | - <para> |
3578 | | - Thesaurus files now use <literal>?</> for stop words. |
3579 | | - </para> |
3580 | | - </listitem> |
| 3549 | + <listitem> |
| 3550 | + <para> |
| 3551 | + Configuration files must be encoded in UTF-8 encoding, |
| 3552 | + regardless of what database encoding is used. |
| 3553 | + </para> |
| 3554 | + </listitem> |
3581 | 3555 |
|
3582 | | - <listitem> |
3583 | | - <para> |
3584 | | - What else? |
| 3556 | + <listitem> |
| 3557 | + <para> |
| 3558 | + In thesaurus configuration files, stop words must be marked with |
| 3559 | + <literal>?</>. |
| 3560 | + </para> |
| 3561 | + </listitem> |
| 3562 | + </itemizedlist> |
3585 | 3563 | </para> |
3586 | 3564 | </listitem> |
3587 | 3565 |
|
|