Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit06735e3

Browse files
committed
Unicode escapes in strings and identifiers
1 parent05bba3d commit06735e3

File tree

18 files changed

+638
-59
lines changed

18 files changed

+638
-59
lines changed

‎doc/src/sgml/syntax.sgml

Lines changed: 134 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.123 2008/06/26 22:24:42 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.124 2008/10/29 08:04:52 petere Exp $ -->
22

33
<chapter id="sql-syntax">
44
<title>SQL Syntax</title>
@@ -189,6 +189,57 @@ UPDATE "my_table" SET "a" = 5;
189189
ampersands. The length limitation still applies.
190190
</para>
191191

192+
<para>
193+
<indexterm><primary>Unicode escape</primary><secondary>in
194+
identifiers</secondary></indexterm> A variant of quoted
195+
identifiers allows including escaped Unicode characters identified
196+
by their code points. This variant starts
197+
with <literal>U&</literal> (upper or lower case U followed by
198+
ampersand) immediately before the opening double quote, without
199+
any spaces in between, for example <literal>U&"foo"</literal>.
200+
(Note that this creates an ambiguity with the
201+
operator <literal>&</literal>. Use spaces around the operator to
202+
avoid this problem.) Inside the quotes, Unicode characters can be
203+
specified in escaped form by writing a backslash followed by the
204+
four-digit hexadecimal code point number or alternatively a
205+
backslash followed by a plus sign followed by a six-digit
206+
hexadecimal code point number. For example, the
207+
identifier <literal>"data"</literal> could be written as
208+
<programlisting>
209+
U&"d\0061t\+000061"
210+
</programlisting>
211+
The following less trivial example writes the Russian
212+
word <quote>slon</quote> (elephant) in Cyrillic letters:
213+
<programlisting>
214+
U&"\0441\043B\043E\043D"
215+
</programlisting>
216+
</para>
217+
218+
<para>
219+
If a different escape character than backslash is desired, it can
220+
be specified using
221+
the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
222+
clause after the string, for example:
223+
<programlisting>
224+
U&"d!0061t!+000061" UESCAPE '!'
225+
</programlisting>
226+
The escape character can be any single character other than a
227+
hexadecimal digit, the plus sign, a single quote, a double quote,
228+
or a whitespace character. Note that the escape character is
229+
written in single quotes, not double quotes.
230+
</para>
231+
232+
<para>
233+
To include the escape character in the identifier literally, write
234+
it twice.
235+
</para>
236+
237+
<para>
238+
The Unicode escape syntax works only when the server encoding is
239+
UTF8. When other server encodings are used, only code points in
240+
the ASCII range (up to <literal>\007F</literal>) can be specified.
241+
</para>
242+
192243
<para>
193244
Quoting an identifier also makes it case-sensitive, whereas
194245
unquoted names are always folded to lower case. For example, the
@@ -245,7 +296,7 @@ UPDATE "my_table" SET "a" = 5;
245296
write two adjacent single quotes, e.g.
246297
<literal>'Dianne''s horse'</literal>.
247298
Note that this is <emphasis>not</> the same as a double-quote
248-
character (<literal>"</>).
299+
character (<literal>"</>). <!-- font-lock sanity: " -->
249300
</para>
250301

251302
<para>
@@ -269,14 +320,19 @@ SELECT 'foo' 'bar';
269320
by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
270321
following the standard.)
271322
</para>
323+
</sect3>
272324

273-
<para>
274-
<indexterm>
325+
<sect3 id="sql-syntax-strings-escape">
326+
<title>String Constants with C-Style Escapes</title>
327+
328+
<indexterm zone="sql-syntax-strings-escape">
275329
<primary>escape string syntax</primary>
276330
</indexterm>
277-
<indexterm>
331+
<indexterm zone="sql-syntax-strings-escape">
278332
<primary>backslash escapes</primary>
279333
</indexterm>
334+
335+
<para>
280336
<productname>PostgreSQL</productname> also accepts <quote>escape</>
281337
string constants, which are an extension to the SQL standard.
282338
An escape string constant is specified by writing the letter
@@ -287,7 +343,8 @@ SELECT 'foo' 'bar';
287343
Within an escape string, a backslash character (<literal>\</>) begins a
288344
C-like <firstterm>backslash escape</> sequence, in which the combination
289345
of backslash and following character(s) represent a special byte
290-
value:
346+
value, as shown in <xref linkend="sql-backslash-table">.
347+
</para>
291348

292349
<table id="sql-backslash-table">
293350
<title>Backslash Escape Sequences</title>
@@ -341,14 +398,24 @@ SELECT 'foo' 'bar';
341398
</tgroup>
342399
</table>
343400

344-
It is your responsibility that the byte sequences you create are
345-
valid characters in the server character set encoding.Any other
401+
<para>
402+
Any other
346403
character following a backslash is taken literally. Thus, to
347404
include a backslash character, write two backslashes (<literal>\\</>).
348405
Also, a single quote can be included in an escape string by writing
349406
<literal>\'</literal>, in addition to the normal way of <literal>''</>.
350407
</para>
351408

409+
<para>
410+
It is your responsibility that the byte sequences you create are
411+
valid characters in the server character set encoding. When the
412+
server encoding is UTF-8, then the alternative Unicode escape
413+
syntax, explained in <xref linkend="sql-syntax-strings-uescape">,
414+
should be used instead. (The alternative would be doing the
415+
UTF-8 encoding by hand and writing out the bytes, which would be
416+
very cumbersome.)
417+
</para>
418+
352419
<caution>
353420
<para>
354421
If the configuration parameter
@@ -379,6 +446,65 @@ SELECT 'foo' 'bar';
379446
</para>
380447
</sect3>
381448

449+
<sect3 id="sql-syntax-strings-uescape">
450+
<title>String Constants with Unicode Escapes</title>
451+
452+
<indexterm zone="sql-syntax-strings-uescape">
453+
<primary>Unicode escape</primary>
454+
<secondary>in string constants</secondary>
455+
</indexterm>
456+
457+
<para>
458+
<productname>PostgreSQL</productname> also supports another type
459+
of escape syntax for strings that allows specifying arbitrary
460+
Unicode characters by code point. A Unicode escape string
461+
constant starts with <literal>U&</literal> (upper or lower case
462+
letter U followed by ampersand) immediately before the opening
463+
quote, without any spaces in between, for
464+
example <literal>U&'foo'</literal>. (Note that this creates an
465+
ambiguity with the operator <literal>&</literal>. Use spaces
466+
around the operator to avoid this problem.) Inside the quotes,
467+
Unicode characters can be specified in escaped form by writing a
468+
backslash followed by the four-digit hexadecimal code point
469+
number or alternatively a backslash followed by a plus sign
470+
followed by a six-digit hexadecimal code point number. For
471+
example, the string <literal>'data'</literal> could be written as
472+
<programlisting>
473+
U&'d\0061t\+000061'
474+
</programlisting>
475+
The following less trivial example writes the Russian
476+
word <quote>slon</quote> (elephant) in Cyrillic letters:
477+
<programlisting>
478+
U&'\0441\043B\043E\043D'
479+
</programlisting>
480+
</para>
481+
482+
<para>
483+
If a different escape character than backslash is desired, it can
484+
be specified using
485+
the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
486+
clause after the string, for example:
487+
<programlisting>
488+
U&'d!0061t!+000061' UESCAPE '!'
489+
</programlisting>
490+
The escape character can be any single character other than a
491+
hexadecimal digit, the plus sign, a single quote, a double quote,
492+
or a whitespace character.
493+
</para>
494+
495+
<para>
496+
The Unicode escape syntax works only when the server encoding is
497+
UTF8. When other server encodings are used, only code points in
498+
the ASCII range (up to <literal>\007F</literal>) can be
499+
specified.
500+
</para>
501+
502+
<para>
503+
To include the escape character in the string literally, write it
504+
twice.
505+
</para>
506+
</sect3>
507+
382508
<sect3 id="sql-syntax-dollar-quoting">
383509
<title>Dollar-Quoted String Constants</title>
384510

‎src/backend/catalog/sql_features.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -238,8 +238,8 @@ F381Extended schema manipulation02ALTER TABLE statement: ADD CONSTRAINT claus
238238
F381Extended schema manipulation03ALTER TABLE statement: DROP CONSTRAINT clauseYES
239239
F382Alter column data typeYES
240240
F391Long identifiersYES
241-
F392Unicode escapes in identifiersNO
242-
F393Unicode escapes in literalsNO
241+
F392Unicode escapes in identifiersYES
242+
F393Unicode escapes in literalsYES
243243
F394Optional normal form specificationNO
244244
F401Extended joined tableYES
245245
F401Extended joined table01NATURAL JOINYES

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp