Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit626b441

Browse files
committed
Put documentation on XML data type and functions in better positions. Add
some index terms.
1 parentb7d3a84 commit626b441

File tree

3 files changed

+3795
-3785
lines changed

3 files changed

+3795
-3785
lines changed

‎doc/src/sgml/config.sgml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.118 2007/03/26 01:41:57 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.119 2007/04/02 15:27:02 petere Exp $ -->
22

33
<chapter Id="runtime-config">
44
<title>Server Configuration</title>
@@ -3591,7 +3591,7 @@ SELECT * FROM parent WHERE key = 2400;
35913591
<primary><varname>SET XML OPTION</></primary>
35923592
</indexterm>
35933593
<indexterm>
3594-
<primary><varname>XML option</></primary>
3594+
<primary>XML option</primary>
35953595
</indexterm>
35963596
<listitem>
35973597
<para>

‎doc/src/sgml/datatype.sgml

Lines changed: 139 additions & 135 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.192 2007/04/0203:49:36 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.193 2007/04/0215:27:02 petere Exp $ -->
22

33
<chapter id="datatype">
44
<title id="datatype-title">Data Types</title>
@@ -3190,6 +3190,144 @@ SELECT * FROM test;
31903190

31913191
</sect1>
31923192

3193+
<sect1 id="datatype-xml">
3194+
<title><acronym>XML</> Type</title>
3195+
3196+
<indexterm zone="datatype-xml">
3197+
<primary>XML</primary>
3198+
</indexterm>
3199+
3200+
<para>
3201+
The data type <type>xml</type> can be used to store XML data. Its
3202+
advantage over storing XML data in a <type>text</type> field is that it
3203+
checks the input values for well-formedness, and there are support
3204+
functions to perform type-safe operations on it; see <xref
3205+
linkend="functions-xml">.
3206+
</para>
3207+
3208+
<para>
3209+
In particular, the <type>xml</type> type can store well-formed
3210+
<quote>documents</quote>, as defined by the XML standard, as well
3211+
as <quote>content</quote> fragments, which are defined by the
3212+
production <literal>XMLDecl? content</literal> in the XML
3213+
standard. Roughly, this means that content fragments can have
3214+
more than one top-level element or character node. The expression
3215+
<literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
3216+
can be used to evaluate whether a particular <type>xml</type>
3217+
value is a full document or only a content fragment.
3218+
</para>
3219+
3220+
<para>
3221+
To produce a value of type <type>xml</type> from character data,
3222+
use the function
3223+
<function>xmlparse</function>:<indexterm><primary>xmlparse</primary></indexterm>
3224+
<synopsis>
3225+
XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>)
3226+
</synopsis>
3227+
Examples:
3228+
<programlisting><![CDATA[
3229+
XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter><book>')
3230+
XMLPARSE (CONTENT 'abc<foo>bar</bar><bar>foo</foo>')
3231+
]]></programlisting>
3232+
While this is the only way to convert character strings into XML
3233+
values according to the SQL standard, the PostgreSQL-specific
3234+
syntaxes:
3235+
<programlisting><![CDATA[
3236+
xml '<foo>bar</foo>'
3237+
'<foo>bar</foo>'::xml
3238+
]]></programlisting>
3239+
can also be used.
3240+
</para>
3241+
3242+
<para>
3243+
The <type>xml</type> type does not validate its input values
3244+
against a possibly included document type declaration
3245+
(DTD).<indexterm><primary>DTD</primary></indexterm>
3246+
</para>
3247+
3248+
<para>
3249+
The inverse operation, producing character string type values from
3250+
<type>xml</type>, uses the function
3251+
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
3252+
<synopsis>
3253+
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
3254+
</synopsis>
3255+
<replaceable>type</replaceable> can be one of
3256+
<type>character</type>, <type>character varying</type>, or
3257+
<type>text</type> (or an alias name for those). Again, according
3258+
to the SQL standard, this is the only way to convert between type
3259+
<type>xml</type> and character types, but PostgreSQL also allows
3260+
you to simply cast the value.
3261+
</para>
3262+
3263+
<para>
3264+
When character string values are cast to or from type
3265+
<type>xml</type> without going through <type>XMLPARSE</type> or
3266+
<type>XMLSERIALIZE</type>, respectively, the choice of
3267+
<literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is
3268+
determined by the <quote>XML option</quote>
3269+
<indexterm><primary>XML option</primary></indexterm>
3270+
session configuration parameter, which can be set using the
3271+
standard command
3272+
<synopsis>
3273+
SET XML OPTION { DOCUMENT | CONTENT };
3274+
</synopsis>
3275+
or the more PostgreSQL-like syntax
3276+
<synopsis>
3277+
SET xmloption TO { DOCUMENT | CONTENT };
3278+
</synopsis>
3279+
The default is <literal>CONTENT</literal>, so all forms of XML
3280+
data are allowed.
3281+
</para>
3282+
3283+
<para>
3284+
Care must be taken when dealing with multiple character encodings
3285+
on the client, server, and in the XML data passed through them.
3286+
When using the text mode to pass queries to the server and query
3287+
results to the client (which is the normal mode), PostgreSQL
3288+
converts all character data passed between the client and the
3289+
server and vice versa to the character encoding of the respective
3290+
end; see <xref linkend="multibyte">. This includes string
3291+
representations of XML values, such as in the above examples.
3292+
This would ordinarily mean that encoding declarations contained in
3293+
XML data might become invalid as the character data is converted
3294+
to other encodings while travelling between client and server,
3295+
while the embedded encoding declaration is not changed. To cope
3296+
with this behavior, an encoding declaration contained in a
3297+
character string presented for input to the <type>xml</type> type
3298+
is <emphasis>ignored</emphasis>, and the content is always assumed
3299+
to be in the current server encoding. Consequently, for correct
3300+
processing, such character strings of XML data must be sent off
3301+
from the client in the current client encoding. It is the
3302+
responsibility of the client to either convert the document to the
3303+
current client encoding before sending it off to the server or to
3304+
adjust the client encoding appropriately. On output, values of
3305+
type <type>xml</type> will not have an encoding declaration, and
3306+
clients must assume that the data is in the current client
3307+
encoding.
3308+
</para>
3309+
3310+
<para>
3311+
When using the binary mode to pass query parameters to the server
3312+
and query results back the the client, no character set conversion
3313+
is performed, so the situation is different. In this case, an
3314+
encoding declaration in the XML data will be observed, and if it
3315+
is absent, the data will be assumed to be in UTF-8 (as required by
3316+
the XML standard; note that PostgreSQL does not support UTF-16 at
3317+
all). On output, data will have an encoding declaration
3318+
specifying the client encoding, unless the client encoding is
3319+
UTF-8, in which case it will be omitted.
3320+
</para>
3321+
3322+
<para>
3323+
Needless to say, processing XML data with PostgreSQL will be less
3324+
error-prone and more efficient if data encoding, client encoding,
3325+
and server encoding are the same. Since XML data is internally
3326+
processed in UTF-8, computations will be most efficient if the
3327+
server encoding is also UTF-8.
3328+
</para>
3329+
</sect1>
3330+
31933331
&array;
31943332

31953333
&rowtypes;
@@ -3579,138 +3717,4 @@ SELECT * FROM pg_attribute
35793717

35803718
</sect1>
35813719

3582-
<sect1 id="datatype-xml">
3583-
<title><acronym>XML</> Type</title>
3584-
3585-
<indexterm zone="datatype-xml">
3586-
<primary>XML</primary>
3587-
</indexterm>
3588-
3589-
<para>
3590-
The data type <type>xml</type> can be used to store XML data. Its
3591-
advantage over storing XML data in a <type>text</type> field is that it
3592-
checks the input values for well-formedness, and there are support
3593-
functions to perform type-safe operations on it; see <xref
3594-
linkend="functions-xml">.
3595-
</para>
3596-
3597-
<para>
3598-
In particular, the <type>xml</type> type can store well-formed
3599-
<quote>documents</quote>, as defined by the XML standard, as well
3600-
as <quote>content</quote> fragments, which are defined by the
3601-
production <literal>XMLDecl? content</literal> in the XML
3602-
standard. Roughly, this means that content fragments can have
3603-
more than one top-level element or character node. The expression
3604-
<literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
3605-
can be used to evaluate whether a particular <type>xml</type>
3606-
value is a full document or only a content fragment.
3607-
</para>
3608-
3609-
<para>
3610-
To produce a value of type <type>xml</type> from character data,
3611-
use the function <function>xmlparse</function>:
3612-
<synopsis>
3613-
XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>)
3614-
</synopsis>
3615-
Examples:
3616-
<programlisting><![CDATA[
3617-
XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter><book>')
3618-
XMLPARSE (CONTENT 'abc<foo>bar</bar><bar>foo</foo>')
3619-
]]></programlisting>
3620-
While this is the only way to convert character strings into XML
3621-
values according to the SQL standard, the PostgreSQL-specific
3622-
syntaxes:
3623-
<programlisting><![CDATA[
3624-
xml '<foo>bar</foo>'
3625-
'<foo>bar</foo>'::xml
3626-
]]></programlisting>
3627-
can also be used.
3628-
</para>
3629-
3630-
<para>
3631-
The <type>xml</type> type does not validate its input values
3632-
against a possibly included document type declaration (DTD).
3633-
</para>
3634-
3635-
<para>
3636-
The inverse operation, producing character string type values from
3637-
<type>xml</type>, uses the function
3638-
<function>xmlserialize</function>:
3639-
<synopsis>
3640-
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
3641-
</synopsis>
3642-
<replaceable>type</replaceable> can be one of
3643-
<type>character</type>, <type>character varying</type>, or
3644-
<type>text</type> (or an alias name for those). Again, according
3645-
to the SQL standard, this is the only way to convert between type
3646-
<type>xml</type> and character types, but PostgreSQL also allows
3647-
you to simply cast the value.
3648-
</para>
3649-
3650-
<para>
3651-
When character string values are cast to or from type
3652-
<type>xml</type> without going through <type>XMLPARSE</type> or
3653-
<type>XMLSERIALIZE</type>, respectively, the choice of
3654-
<literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is
3655-
determined by the <quote>XML option</quote> session configuration
3656-
parameter, which can be set using the standard command
3657-
<synopsis>
3658-
SET XML OPTION { DOCUMENT | CONTENT };
3659-
</synopsis>
3660-
or the more PostgreSQL-like syntax
3661-
<synopsis>
3662-
SET xmloption TO { DOCUMENT | CONTENT };
3663-
</synopsis>
3664-
The default is <literal>CONTENT</literal>, so all forms of XML
3665-
data are allowed.
3666-
</para>
3667-
3668-
<para>
3669-
Care must be taken when dealing with multiple character encodings
3670-
on the client, server, and in the XML data passed through them.
3671-
When using the text mode to pass queries to the server and query
3672-
results to the client (which is the normal mode), PostgreSQL
3673-
converts all character data passed between the client and the
3674-
server and vice versa to the character encoding of the respective
3675-
end; see <xref linkend="multibyte">. This includes string
3676-
representations of XML values, such as in the above examples.
3677-
This would ordinarily mean that encoding declarations contained in
3678-
XML data might become invalid as the character data is converted
3679-
to other encodings while travelling between client and server,
3680-
while the embedded encoding declaration is not changed. To cope
3681-
with this behavior, an encoding declaration contained in a
3682-
character string presented for input to the <type>xml</type> type
3683-
is <emphasis>ignored</emphasis>, and the content is always assumed
3684-
to be in the current server encoding. Consequently, for correct
3685-
processing, such character strings of XML data must be sent off
3686-
from the client in the current client encoding. It is the
3687-
responsibility of the client to either convert the document to the
3688-
current client encoding before sending it off to the server or to
3689-
adjust the client encoding appropriately. On output, values of
3690-
type <type>xml</type> will not have an encoding declaration, and
3691-
clients must assume that the data is in the current client
3692-
encoding.
3693-
</para>
3694-
3695-
<para>
3696-
When using the binary mode to pass query parameters to the server
3697-
and query results back the the client, no character set conversion
3698-
is performed, so the situation is different. In this case, an
3699-
encoding declaration in the XML data will be observed, and if it
3700-
is absent, the data will be assumed to be in UTF-8 (as required by
3701-
the XML standard; note that PostgreSQL does not support UTF-16 at
3702-
all). On output, data will have an encoding declaration
3703-
specifying the client encoding, unless the client encoding is
3704-
UTF-8, in which case it will be omitted.
3705-
</para>
3706-
3707-
<para>
3708-
Needless to say, processing XML data with PostgreSQL will be less
3709-
error-prone and more efficient if data encoding, client encoding,
3710-
and server encoding are the same. Since XML data is internally
3711-
processed in UTF-8, computations will be most efficient if the
3712-
server encoding is also UTF-8.
3713-
</para>
3714-
</sect1>
3715-
37163720
</chapter>

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp