Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit7894ac5

Browse files
committed
Make sure chr(int) can't create invalid UTF8 sequences.
Several years ago we changed chr(int) so that if the database encoding isUTF8, it would interpret its argument as a Unicode code point and expand itinto the appropriate multibyte sequence. However, we weren't sufficientlycareful about checking validity of the input. According to RFC3629, UTF8disallows code points above U+10FFFF (note that the predecessor standardRFC2279 was more liberal). Also, both versions of the UTF8 spec agreethat Unicode surrogate-pair codes should never appear in UTF8. Becauseour encoding validity checks follow RFC3629, our failure to enforce theserestrictions in chr() means it could be used to produce text strings thatwill be rejected when the database is dumped and reloaded. To ensureconsistency with the input functions, let's actually applypg_utf8_islegal() to the proposed output of chr().Per discussion, this seems like too much of a behavioral change toback-patch, but it's not too late to squeeze it into 9.4.
1 parentaf215d8 commit7894ac5

File tree

1 file changed

+18
-7
lines changed

1 file changed

+18
-7
lines changed

‎src/backend/utils/adt/oracle_compat.c

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -932,10 +932,14 @@ chr(PG_FUNCTION_ARGS)
932932
{
933933
/* for Unicode we treat the argument as a code point */
934934
intbytes;
935-
char*wch;
935+
unsignedchar*wch;
936936

937-
/* We only allow valid Unicode code points */
938-
if (cvalue>0x001fffff)
937+
/*
938+
* We only allow valid Unicode code points; per RFC3629 that stops at
939+
* U+10FFFF, even though 4-byte UTF8 sequences can hold values up to
940+
* U+1FFFFF.
941+
*/
942+
if (cvalue>0x0010ffff)
939943
ereport(ERROR,
940944
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
941945
errmsg("requested character too large for encoding: %d",
@@ -950,7 +954,7 @@ chr(PG_FUNCTION_ARGS)
950954

951955
result= (text*)palloc(VARHDRSZ+bytes);
952956
SET_VARSIZE(result,VARHDRSZ+bytes);
953-
wch=VARDATA(result);
957+
wch=(unsignedchar*)VARDATA(result);
954958

955959
if (bytes==2)
956960
{
@@ -971,8 +975,17 @@ chr(PG_FUNCTION_ARGS)
971975
wch[3]=0x80 | (cvalue&0x3F);
972976
}
973977

978+
/*
979+
* The preceding range check isn't sufficient, because UTF8 excludes
980+
* Unicode "surrogate pair" codes. Make sure what we created is valid
981+
* UTF8.
982+
*/
983+
if (!pg_utf8_islegal(wch,bytes))
984+
ereport(ERROR,
985+
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
986+
errmsg("requested character not valid for encoding: %d",
987+
cvalue)));
974988
}
975-
976989
else
977990
{
978991
boolis_mb;
@@ -981,7 +994,6 @@ chr(PG_FUNCTION_ARGS)
981994
* Error out on arguments that make no sense or that we can't validly
982995
* represent in the encoding.
983996
*/
984-
985997
if (cvalue==0)
986998
ereport(ERROR,
987999
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
@@ -995,7 +1007,6 @@ chr(PG_FUNCTION_ARGS)
9951007
errmsg("requested character too large for encoding: %d",
9961008
cvalue)));
9971009

998-
9991010
result= (text*)palloc(VARHDRSZ+1);
10001011
SET_VARSIZE(result,VARHDRSZ+1);
10011012
*VARDATA(result)= (char)cvalue;

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp