NotificationsYou must be signed in to change notification settings
Fork5
Star26

Commit7894ac5

committed

Make sure chr(int) can't create invalid UTF8 sequences.

Several years ago we changed chr(int) so that if the database encoding isUTF8, it would interpret its argument as a Unicode code point and expand itinto the appropriate multibyte sequence. However, we weren't sufficientlycareful about checking validity of the input. According to RFC3629, UTF8disallows code points above U+10FFFF (note that the predecessor standardRFC2279 was more liberal). Also, both versions of the UTF8 spec agreethat Unicode surrogate-pair codes should never appear in UTF8. Becauseour encoding validity checks follow RFC3629, our failure to enforce theserestrictions in chr() means it could be used to produce text strings thatwill be rejected when the database is dumped and reloaded. To ensureconsistency with the input functions, let's actually applypg_utf8_islegal() to the proposed output of chr().Per discussion, this seems like too much of a behavioral change toback-patch, but it's not too late to squeeze it into 9.4.

1 parentaf215d8 commit7894ac5Copy full SHA for 7894ac5

File tree

1 file changed

+18

-7

lines changed

src/backend/utils/adt
- oracle_compat.c

1 file changed

+18

-7

lines changed

`‎src/backend/utils/adt/oracle_compat.c`

Lines changed: 18 additions & 7 deletions

Original file line number	Diff line number	Diff line change
`@@ -932,10 +932,14 @@ chr(PG_FUNCTION_ARGS)`
`932`	`932`	`{`
`933`	`933`	`/* for Unicode we treat the argument as a code point */`
`934`	`934`	`intbytes;`
`935`		`-char*wch;`
	`935`	`+unsignedchar*wch;`
`936`	`936`
`937`		`-/* We only allow valid Unicode code points */`
`938`		`-if (cvalue>0x001fffff)`
	`937`	`+/*`
	`938`	`+ * We only allow valid Unicode code points; per RFC3629 that stops at`
	`939`	`+ * U+10FFFF, even though 4-byte UTF8 sequences can hold values up to`
	`940`	`+ * U+1FFFFF.`
	`941`	`+ */`
	`942`	`+if (cvalue>0x0010ffff)`
`939`	`943`	`ereport(ERROR,`
`940`	`944`	`(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),`
`941`	`945`	`errmsg("requested character too large for encoding: %d",`
`@@ -950,7 +954,7 @@ chr(PG_FUNCTION_ARGS)`
`950`	`954`
`951`	`955`	`result= (text*)palloc(VARHDRSZ+bytes);`
`952`	`956`	`SET_VARSIZE(result,VARHDRSZ+bytes);`
`953`		`-wch=VARDATA(result);`
	`957`	`+wch=(unsignedchar*)VARDATA(result);`
`954`	`958`
`955`	`959`	`if (bytes==2)`
`956`	`960`	`{`
`@@ -971,8 +975,17 @@ chr(PG_FUNCTION_ARGS)`
`971`	`975`	`wch[3]=0x80 \| (cvalue&0x3F);`
`972`	`976`	`}`
`973`	`977`
	`978`	`+/*`
	`979`	`+ * The preceding range check isn't sufficient, because UTF8 excludes`
	`980`	`+ * Unicode "surrogate pair" codes. Make sure what we created is valid`
	`981`	`+ * UTF8.`
	`982`	`+ */`
	`983`	`+if (!pg_utf8_islegal(wch,bytes))`
	`984`	`+ereport(ERROR,`
	`985`	`+(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),`
	`986`	`+errmsg("requested character not valid for encoding: %d",`
	`987`	`+cvalue)));`
`974`	`988`	`}`
`975`		`-`
`976`	`989`	`else`
`977`	`990`	`{`
`978`	`991`	`boolis_mb;`
`@@ -981,7 +994,6 @@ chr(PG_FUNCTION_ARGS)`
`981`	`994`	`* Error out on arguments that make no sense or that we can't validly`
`982`	`995`	`* represent in the encoding.`
`983`	`996`	`*/`
`984`		`-`
`985`	`997`	`if (cvalue==0)`
`986`	`998`	`ereport(ERROR,`
`987`	`999`	`(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),`
`@@ -995,7 +1007,6 @@ chr(PG_FUNCTION_ARGS)`
`995`	`1007`	`errmsg("requested character too large for encoding: %d",`
`996`	`1008`	`cvalue)));`
`997`	`1009`
`998`		`-`
`999`	`1010`	`result= (text*)palloc(VARHDRSZ+1);`
`1000`	`1011`	`SET_VARSIZE(result,VARHDRSZ+1);`
`1001`	`1012`	`*VARDATA(result)= (char)cvalue;`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit7894ac5

File tree

1 file changed

1 file changed

`‎src/backend/utils/adt/oracle_compat.c`

0 commit comments