Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitd0f18cd

Browse files
committed
Fix documentation of regular expression character-entry escapes.
The docs claimed that \uhhhh would be interpreted as a Unicode valueregardless of the database encoding, but it's never been implementedthat way: \uhhhh and \xhhhh actually mean exactly the same thing, namelythe character that pg_mb2wchar translates to 0xhhhh. Moreover we werefalsely dismissive of the usefulness of Unicode code points above FFFF.Fix that.It's been like this for ages, so back-patch to all supported branches.
1 parent4d0fc1d commitd0f18cd

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

‎doc/src/sgml/func.sgml

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4669,7 +4669,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
46694669
<entry> <literal>\e</> </entry>
46704670
<entry> the character whose collating-sequence name
46714671
is <literal>ESC</>,
4672-
or failing that, the character with octal value 033 </entry>
4672+
or failing that, the character with octal value<literal>033</> </entry>
46734673
</row>
46744674

46754675
<row>
@@ -4695,15 +4695,17 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
46954695
<row>
46964696
<entry> <literal>\u</><replaceable>wxyz</> </entry>
46974697
<entry> (where <replaceable>wxyz</> is exactly four hexadecimal digits)
4698-
the UTF16 (Unicode, 16-bit) character <literal>U+</><replaceable>wxyz</>
4699-
in the local byte ordering </entry>
4698+
the character whose hexadecimal value is
4699+
<literal>0x</><replaceable>wxyz</>
4700+
</entry>
47004701
</row>
47014702

47024703
<row>
47034704
<entry> <literal>\U</><replaceable>stuvwxyz</> </entry>
47044705
<entry> (where <replaceable>stuvwxyz</> is exactly eight hexadecimal
47054706
digits)
4706-
reserved for a hypothetical Unicode extension to 32 bits
4707+
the character whose hexadecimal value is
4708+
<literal>0x</><replaceable>stuvwxyz</>
47074709
</entry>
47084710
</row>
47094711

@@ -4752,6 +4754,17 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
47524754
Octal digits are <literal>0</>-<literal>7</>.
47534755
</para>
47544756

4757+
<para>
4758+
Numeric character-entry escapes specifying values outside the ASCII range
4759+
(0-127) have meanings dependent on the database encoding. When the
4760+
encoding is UTF-8, escape values are equivalent to Unicode code points,
4761+
for example <literal>\u1234</> means the character <literal>U+1234</>.
4762+
For other multibyte encodings, character-entry escapes usually just
4763+
specify the concatenation of the byte values for the character. If the
4764+
escape value does not correspond to any legal character in the database
4765+
encoding, no error will be raised, but it will never match any data.
4766+
</para>
4767+
47554768
<para>
47564769
The character-entry escapes are always taken as ordinary characters.
47574770
For example, <literal>\135</> is <literal>]</> in ASCII, but

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp