Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit0208410

Browse files
committed
Clean up encoding issues in the xml type: In text mode, encoding
declarations are ignored and removed, in binary mode they are honored asspecified by the XML standard.
1 parentc81bfc2 commit0208410

File tree

3 files changed

+300
-59
lines changed

3 files changed

+300
-59
lines changed

‎doc/src/sgml/datatype.sgml

Lines changed: 102 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.184 2007/01/14 22:37:59 neilc Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.185 2007/01/18 13:59:11 petere Exp $ -->
22

33
<chapter id="datatype">
44
<title id="datatype-title">Data Types</title>
@@ -3418,8 +3418,107 @@ SELECT * FROM pg_attribute
34183418
advantage over storing XML data in a <type>text</type> field is that it
34193419
checks the input values for well-formedness, and there are support
34203420
functions to perform type-safe operations on it; see <xref
3421-
linkend="functions-xml">. Currently, there is no support for
3422-
validation against a specific <acronym>XML</> schema.
3421+
linkend="functions-xml">.
3422+
</para>
3423+
3424+
<para>
3425+
In particular, the <type>xml</type> type can store well-formed
3426+
<quote>documents</quote>, as defined by the XML standard, as well
3427+
as <quote>content</quote> fragments, which are defined by the
3428+
production <literal>XMLDecl? content</literal> in the XML
3429+
standard. Roughly, this means that content fragments can have
3430+
more than one top-level element or character node. The expression
3431+
<literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
3432+
can be used to evaluate whether a particular <type>xml</type>
3433+
value is a full document or only a content fragment.
3434+
</para>
3435+
3436+
<para>
3437+
To produce a value of type <type>xml</type> from character data,
3438+
use the function <function>xmlparse</function>:
3439+
<synopsis>
3440+
XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>)
3441+
</synopsis>
3442+
Examples:
3443+
<programlisting><![CDATA[
3444+
XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter><book>')
3445+
XMLPARSE (CONTENT 'abc<foo>bar</bar><bar>foo</foo>')
3446+
]]></programlisting>
3447+
While this is the only way to convert character strings into XML
3448+
values according to the SQL standard, the PostgreSQL-specific
3449+
syntaxes
3450+
<programlisting><![CDATA[
3451+
xml '<foo>bar</foo>'
3452+
'<foo>bar</foo>'::xml
3453+
]]></programlisting>
3454+
can also be used.
3455+
</para>
3456+
3457+
<para>
3458+
The <type>xml</type> type does not validate its input values
3459+
against a possibly included document type declaration (DTD).
3460+
</para>
3461+
3462+
<para>
3463+
The inverse operation, producing character string type values from
3464+
<type>xml</type>, uses the function
3465+
<function>xmlserialize</function>:
3466+
<synopsis>
3467+
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
3468+
</synopsis>
3469+
<replaceable>type</replaceable> can be one of
3470+
<type>character</type>, <type>character varying</type>, or
3471+
<type>text</type> (or an alias name for those). Again, according
3472+
to the SQL standard, this is the only way to convert between type
3473+
<type>xml</type> and character types, but PostgreSQL also allows
3474+
you to simply cast the value.
3475+
</para>
3476+
3477+
<para>
3478+
Care must be taken when dealing with multiple character encodings
3479+
on the client, server, and in the XML data passed through them.
3480+
When using the text mode to pass queries to the server and query
3481+
results to the client (which is the normal mode), PostgreSQL
3482+
converts all character data passed between the client and the
3483+
server and vice versa to the character encoding of the respective
3484+
end; see <xref linkend="multibyte">. This includes string
3485+
representations of XML values, such as in the above examples.
3486+
This would ordinarily mean that encoding declarations contained in
3487+
XML data might become invalid as the character data is converted
3488+
to other encodings while travelling between client and server,
3489+
while the embedded encoding declaration is not changed. To cope
3490+
with this behavior, an encoding declaration contained in a
3491+
character string presented for input to the <type>xml</type> type
3492+
is <emphasis>ignored</emphasis>, and the content is always assumed
3493+
to be in the current server encoding. Consequently, for correct
3494+
processing, such character strings of XML data must be sent off
3495+
from the client in the current client encoding. It is the
3496+
responsibility of the client to either convert the document to the
3497+
current client encoding before sending it off to the server or to
3498+
adjust the client encoding appropriately. On output, values of
3499+
type <type>xml</type> will not have an encoding declaration, and
3500+
clients must assume that the data is in the current client
3501+
encoding.
3502+
</para>
3503+
3504+
<para>
3505+
When using the binary mode to pass query parameters to the server
3506+
and query results back the the client, no character set conversion
3507+
is performed, so the situation is different. In this case, an
3508+
encoding declaration in the XML data will be observed, and if it
3509+
is absent, the data will be assumed to be in UTF-8 (as required by
3510+
the XML standard; note that PostgreSQL does not support UTF-16 at
3511+
all). On output, data will have an encoding declaration
3512+
specifying the client encoding, unless the client encoding is
3513+
UTF-8, in which case it will be omitted.
3514+
</para>
3515+
3516+
<para>
3517+
Needless to say, processing XML data with PostgreSQL will be less
3518+
error-prone and more efficient if data encoding, client encoding,
3519+
and server encoding are the same. Since XML data is internally
3520+
processed in UTF-8, computations will be most efficient if the
3521+
server encoding is also UTF-8.
34233522
</para>
34243523

34253524
<para>

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp