|
1 | | -<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.192 2007/04/0203:49:36 tgl Exp $ --> |
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.193 2007/04/0215:27:02 petere Exp $ --> |
2 | 2 |
|
3 | 3 | <chapter id="datatype"> |
4 | 4 | <title id="datatype-title">Data Types</title> |
@@ -3190,6 +3190,144 @@ SELECT * FROM test; |
3190 | 3190 |
|
3191 | 3191 | </sect1> |
3192 | 3192 |
|
| 3193 | + <sect1 id="datatype-xml"> |
| 3194 | + <title><acronym>XML</> Type</title> |
| 3195 | + |
| 3196 | + <indexterm zone="datatype-xml"> |
| 3197 | + <primary>XML</primary> |
| 3198 | + </indexterm> |
| 3199 | + |
| 3200 | + <para> |
| 3201 | + The data type <type>xml</type> can be used to store XML data. Its |
| 3202 | + advantage over storing XML data in a <type>text</type> field is that it |
| 3203 | + checks the input values for well-formedness, and there are support |
| 3204 | + functions to perform type-safe operations on it; see <xref |
| 3205 | + linkend="functions-xml">. |
| 3206 | + </para> |
| 3207 | + |
| 3208 | + <para> |
| 3209 | + In particular, the <type>xml</type> type can store well-formed |
| 3210 | + <quote>documents</quote>, as defined by the XML standard, as well |
| 3211 | + as <quote>content</quote> fragments, which are defined by the |
| 3212 | + production <literal>XMLDecl? content</literal> in the XML |
| 3213 | + standard. Roughly, this means that content fragments can have |
| 3214 | + more than one top-level element or character node. The expression |
| 3215 | + <literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal> |
| 3216 | + can be used to evaluate whether a particular <type>xml</type> |
| 3217 | + value is a full document or only a content fragment. |
| 3218 | + </para> |
| 3219 | + |
| 3220 | + <para> |
| 3221 | + To produce a value of type <type>xml</type> from character data, |
| 3222 | + use the function |
| 3223 | + <function>xmlparse</function>:<indexterm><primary>xmlparse</primary></indexterm> |
| 3224 | +<synopsis> |
| 3225 | +XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>) |
| 3226 | +</synopsis> |
| 3227 | + Examples: |
| 3228 | +<programlisting><![CDATA[ |
| 3229 | +XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter><book>') |
| 3230 | +XMLPARSE (CONTENT 'abc<foo>bar</bar><bar>foo</foo>') |
| 3231 | +]]></programlisting> |
| 3232 | + While this is the only way to convert character strings into XML |
| 3233 | + values according to the SQL standard, the PostgreSQL-specific |
| 3234 | + syntaxes: |
| 3235 | +<programlisting><![CDATA[ |
| 3236 | +xml '<foo>bar</foo>' |
| 3237 | +'<foo>bar</foo>'::xml |
| 3238 | +]]></programlisting> |
| 3239 | + can also be used. |
| 3240 | + </para> |
| 3241 | + |
| 3242 | + <para> |
| 3243 | + The <type>xml</type> type does not validate its input values |
| 3244 | + against a possibly included document type declaration |
| 3245 | + (DTD).<indexterm><primary>DTD</primary></indexterm> |
| 3246 | + </para> |
| 3247 | + |
| 3248 | + <para> |
| 3249 | + The inverse operation, producing character string type values from |
| 3250 | + <type>xml</type>, uses the function |
| 3251 | + <function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm> |
| 3252 | +<synopsis> |
| 3253 | +XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> ) |
| 3254 | +</synopsis> |
| 3255 | + <replaceable>type</replaceable> can be one of |
| 3256 | + <type>character</type>, <type>character varying</type>, or |
| 3257 | + <type>text</type> (or an alias name for those). Again, according |
| 3258 | + to the SQL standard, this is the only way to convert between type |
| 3259 | + <type>xml</type> and character types, but PostgreSQL also allows |
| 3260 | + you to simply cast the value. |
| 3261 | + </para> |
| 3262 | + |
| 3263 | + <para> |
| 3264 | + When character string values are cast to or from type |
| 3265 | + <type>xml</type> without going through <type>XMLPARSE</type> or |
| 3266 | + <type>XMLSERIALIZE</type>, respectively, the choice of |
| 3267 | + <literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is |
| 3268 | + determined by the <quote>XML option</quote> |
| 3269 | + <indexterm><primary>XML option</primary></indexterm> |
| 3270 | + session configuration parameter, which can be set using the |
| 3271 | + standard command |
| 3272 | +<synopsis> |
| 3273 | +SET XML OPTION { DOCUMENT | CONTENT }; |
| 3274 | +</synopsis> |
| 3275 | + or the more PostgreSQL-like syntax |
| 3276 | +<synopsis> |
| 3277 | +SET xmloption TO { DOCUMENT | CONTENT }; |
| 3278 | +</synopsis> |
| 3279 | + The default is <literal>CONTENT</literal>, so all forms of XML |
| 3280 | + data are allowed. |
| 3281 | + </para> |
| 3282 | + |
| 3283 | + <para> |
| 3284 | + Care must be taken when dealing with multiple character encodings |
| 3285 | + on the client, server, and in the XML data passed through them. |
| 3286 | + When using the text mode to pass queries to the server and query |
| 3287 | + results to the client (which is the normal mode), PostgreSQL |
| 3288 | + converts all character data passed between the client and the |
| 3289 | + server and vice versa to the character encoding of the respective |
| 3290 | + end; see <xref linkend="multibyte">. This includes string |
| 3291 | + representations of XML values, such as in the above examples. |
| 3292 | + This would ordinarily mean that encoding declarations contained in |
| 3293 | + XML data might become invalid as the character data is converted |
| 3294 | + to other encodings while travelling between client and server, |
| 3295 | + while the embedded encoding declaration is not changed. To cope |
| 3296 | + with this behavior, an encoding declaration contained in a |
| 3297 | + character string presented for input to the <type>xml</type> type |
| 3298 | + is <emphasis>ignored</emphasis>, and the content is always assumed |
| 3299 | + to be in the current server encoding. Consequently, for correct |
| 3300 | + processing, such character strings of XML data must be sent off |
| 3301 | + from the client in the current client encoding. It is the |
| 3302 | + responsibility of the client to either convert the document to the |
| 3303 | + current client encoding before sending it off to the server or to |
| 3304 | + adjust the client encoding appropriately. On output, values of |
| 3305 | + type <type>xml</type> will not have an encoding declaration, and |
| 3306 | + clients must assume that the data is in the current client |
| 3307 | + encoding. |
| 3308 | + </para> |
| 3309 | + |
| 3310 | + <para> |
| 3311 | + When using the binary mode to pass query parameters to the server |
| 3312 | + and query results back the the client, no character set conversion |
| 3313 | + is performed, so the situation is different. In this case, an |
| 3314 | + encoding declaration in the XML data will be observed, and if it |
| 3315 | + is absent, the data will be assumed to be in UTF-8 (as required by |
| 3316 | + the XML standard; note that PostgreSQL does not support UTF-16 at |
| 3317 | + all). On output, data will have an encoding declaration |
| 3318 | + specifying the client encoding, unless the client encoding is |
| 3319 | + UTF-8, in which case it will be omitted. |
| 3320 | + </para> |
| 3321 | + |
| 3322 | + <para> |
| 3323 | + Needless to say, processing XML data with PostgreSQL will be less |
| 3324 | + error-prone and more efficient if data encoding, client encoding, |
| 3325 | + and server encoding are the same. Since XML data is internally |
| 3326 | + processed in UTF-8, computations will be most efficient if the |
| 3327 | + server encoding is also UTF-8. |
| 3328 | + </para> |
| 3329 | + </sect1> |
| 3330 | + |
3193 | 3331 | &array; |
3194 | 3332 |
|
3195 | 3333 | &rowtypes; |
@@ -3579,138 +3717,4 @@ SELECT * FROM pg_attribute |
3579 | 3717 |
|
3580 | 3718 | </sect1> |
3581 | 3719 |
|
3582 | | - <sect1 id="datatype-xml"> |
3583 | | - <title><acronym>XML</> Type</title> |
3584 | | - |
3585 | | - <indexterm zone="datatype-xml"> |
3586 | | - <primary>XML</primary> |
3587 | | - </indexterm> |
3588 | | - |
3589 | | - <para> |
3590 | | - The data type <type>xml</type> can be used to store XML data. Its |
3591 | | - advantage over storing XML data in a <type>text</type> field is that it |
3592 | | - checks the input values for well-formedness, and there are support |
3593 | | - functions to perform type-safe operations on it; see <xref |
3594 | | - linkend="functions-xml">. |
3595 | | - </para> |
3596 | | - |
3597 | | - <para> |
3598 | | - In particular, the <type>xml</type> type can store well-formed |
3599 | | - <quote>documents</quote>, as defined by the XML standard, as well |
3600 | | - as <quote>content</quote> fragments, which are defined by the |
3601 | | - production <literal>XMLDecl? content</literal> in the XML |
3602 | | - standard. Roughly, this means that content fragments can have |
3603 | | - more than one top-level element or character node. The expression |
3604 | | - <literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal> |
3605 | | - can be used to evaluate whether a particular <type>xml</type> |
3606 | | - value is a full document or only a content fragment. |
3607 | | - </para> |
3608 | | - |
3609 | | - <para> |
3610 | | - To produce a value of type <type>xml</type> from character data, |
3611 | | - use the function <function>xmlparse</function>: |
3612 | | -<synopsis> |
3613 | | -XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>) |
3614 | | -</synopsis> |
3615 | | - Examples: |
3616 | | -<programlisting><![CDATA[ |
3617 | | -XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter><book>') |
3618 | | -XMLPARSE (CONTENT 'abc<foo>bar</bar><bar>foo</foo>') |
3619 | | -]]></programlisting> |
3620 | | - While this is the only way to convert character strings into XML |
3621 | | - values according to the SQL standard, the PostgreSQL-specific |
3622 | | - syntaxes: |
3623 | | -<programlisting><![CDATA[ |
3624 | | -xml '<foo>bar</foo>' |
3625 | | -'<foo>bar</foo>'::xml |
3626 | | -]]></programlisting> |
3627 | | - can also be used. |
3628 | | - </para> |
3629 | | - |
3630 | | - <para> |
3631 | | - The <type>xml</type> type does not validate its input values |
3632 | | - against a possibly included document type declaration (DTD). |
3633 | | - </para> |
3634 | | - |
3635 | | - <para> |
3636 | | - The inverse operation, producing character string type values from |
3637 | | - <type>xml</type>, uses the function |
3638 | | - <function>xmlserialize</function>: |
3639 | | -<synopsis> |
3640 | | -XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> ) |
3641 | | -</synopsis> |
3642 | | - <replaceable>type</replaceable> can be one of |
3643 | | - <type>character</type>, <type>character varying</type>, or |
3644 | | - <type>text</type> (or an alias name for those). Again, according |
3645 | | - to the SQL standard, this is the only way to convert between type |
3646 | | - <type>xml</type> and character types, but PostgreSQL also allows |
3647 | | - you to simply cast the value. |
3648 | | - </para> |
3649 | | - |
3650 | | - <para> |
3651 | | - When character string values are cast to or from type |
3652 | | - <type>xml</type> without going through <type>XMLPARSE</type> or |
3653 | | - <type>XMLSERIALIZE</type>, respectively, the choice of |
3654 | | - <literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is |
3655 | | - determined by the <quote>XML option</quote> session configuration |
3656 | | - parameter, which can be set using the standard command |
3657 | | -<synopsis> |
3658 | | -SET XML OPTION { DOCUMENT | CONTENT }; |
3659 | | -</synopsis> |
3660 | | - or the more PostgreSQL-like syntax |
3661 | | -<synopsis> |
3662 | | -SET xmloption TO { DOCUMENT | CONTENT }; |
3663 | | -</synopsis> |
3664 | | - The default is <literal>CONTENT</literal>, so all forms of XML |
3665 | | - data are allowed. |
3666 | | - </para> |
3667 | | - |
3668 | | - <para> |
3669 | | - Care must be taken when dealing with multiple character encodings |
3670 | | - on the client, server, and in the XML data passed through them. |
3671 | | - When using the text mode to pass queries to the server and query |
3672 | | - results to the client (which is the normal mode), PostgreSQL |
3673 | | - converts all character data passed between the client and the |
3674 | | - server and vice versa to the character encoding of the respective |
3675 | | - end; see <xref linkend="multibyte">. This includes string |
3676 | | - representations of XML values, such as in the above examples. |
3677 | | - This would ordinarily mean that encoding declarations contained in |
3678 | | - XML data might become invalid as the character data is converted |
3679 | | - to other encodings while travelling between client and server, |
3680 | | - while the embedded encoding declaration is not changed. To cope |
3681 | | - with this behavior, an encoding declaration contained in a |
3682 | | - character string presented for input to the <type>xml</type> type |
3683 | | - is <emphasis>ignored</emphasis>, and the content is always assumed |
3684 | | - to be in the current server encoding. Consequently, for correct |
3685 | | - processing, such character strings of XML data must be sent off |
3686 | | - from the client in the current client encoding. It is the |
3687 | | - responsibility of the client to either convert the document to the |
3688 | | - current client encoding before sending it off to the server or to |
3689 | | - adjust the client encoding appropriately. On output, values of |
3690 | | - type <type>xml</type> will not have an encoding declaration, and |
3691 | | - clients must assume that the data is in the current client |
3692 | | - encoding. |
3693 | | - </para> |
3694 | | - |
3695 | | - <para> |
3696 | | - When using the binary mode to pass query parameters to the server |
3697 | | - and query results back the the client, no character set conversion |
3698 | | - is performed, so the situation is different. In this case, an |
3699 | | - encoding declaration in the XML data will be observed, and if it |
3700 | | - is absent, the data will be assumed to be in UTF-8 (as required by |
3701 | | - the XML standard; note that PostgreSQL does not support UTF-16 at |
3702 | | - all). On output, data will have an encoding declaration |
3703 | | - specifying the client encoding, unless the client encoding is |
3704 | | - UTF-8, in which case it will be omitted. |
3705 | | - </para> |
3706 | | - |
3707 | | - <para> |
3708 | | - Needless to say, processing XML data with PostgreSQL will be less |
3709 | | - error-prone and more efficient if data encoding, client encoding, |
3710 | | - and server encoding are the same. Since XML data is internally |
3711 | | - processed in UTF-8, computations will be most efficient if the |
3712 | | - server encoding is also UTF-8. |
3713 | | - </para> |
3714 | | - </sect1> |
3715 | | - |
3716 | 3720 | </chapter> |