NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit098c134

committed

Fix buffer overrun in unicode string normalization with empty input

PostgreSQL 13 and newer versions are directly impacted by that throughthe SQL function normalize(), which would cause a call of this functionto write one byte past its allocation if using in input an emptystring after recomposing the string with NFC and NFKC. Older versions(v10~v12) are not directly affected by this problem as the only codepath using normalization is SASLprep in SCRAM authentication thatforbids the case of an empty string, but let's make the code more robustanyway there so as any out-of-core callers of this function are covered.The solution chosen to fix this issue is simple, with the addition of afast-exit path if the decomposed string is found as empty. This wouldonly happen for an empty string as at its lowest level a codepoint wouldbe decomposed as itself if it has no entry in the decomposition table orif it has a decomposition size of 0.Some tests are added to cover this issue in v13~. Note that an emptystring has always been considered as normalized (grammar "IS NF[K]{C,D}NORMALIZED", through the SQL function is_normalized()) for all theoperations allowed (NFC, NFD, NFKC and NFKD) since this feature has beenintroduced as of2991ac5. This behavior is unchanged but some tests areadded in v13~ to check after that.I have also checked "make normalization-check" in src/common/unicode/,while on it (works in 13~, and breaks in older stable branchesindependently of this commit).The release notes should just mention this commit for v13~.Reported-by: Matthijs van der VleutenDiscussion:https://postgr.es/m/17277-0c527a373794e802@postgresql.orgBackpatch-through: 10

1 parent9ff47ea commit098c134Copy full SHA for 098c134

File tree

3 files changed

+17

-3

lines changed

src
- common
  - unicode_norm.c
- test/regress
  - expected
    - unicode.out
  - sql
    - unicode.sql

3 files changed

+17

-3

lines changed

`‎src/common/unicode_norm.c`

Lines changed: 4 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -439,6 +439,10 @@ unicode_normalize(UnicodeNormalizationForm form, const pg_wchar *input)`
`439`	`439`	`decomp_chars[decomp_size]='\0';`
`440`	`440`	`Assert(decomp_size==current_size);`
`441`	`441`
	`442`	`+/* Leave if there is nothing to decompose */`
	`443`	`+if (decomp_size==0)`
	`444`	`+returndecomp_chars;`
	`445`	`+`
`442`	`446`	`/*`
`443`	`447`	`* Now apply canonical ordering.`
`444`	`448`	`*/`

`‎src/test/regress/expected/unicode.out`

Lines changed: 10 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,6 +8,12 @@ SELECT U&'\0061\0308bc' <> U&'\00E4bc' COLLATE "C" AS sanity_check;`
`8`	`8`	`t`
`9`	`9`	`(1 row)`
`10`	`10`
	`11`	`+SELECT normalize('');`
	`12`	`+ normalize`
	`13`	`+-----------`
	`14`	`+`
	`15`	`+(1 row)`
	`16`	`+`
`11`	`17`	`SELECT normalize(U&'\0061\0308\24D1c') = U&'\00E4\24D1c' COLLATE "C" AS test_default;`
`12`	`18`	`test_default`
`13`	`19`	`--------------`
`@@ -67,15 +73,17 @@ FROM`
`67`	`73`	`(VALUES (1, U&'\00E4bc'),`
`68`	`74`	`(2, U&'\0061\0308bc'),`
`69`	`75`	`(3, U&'\00E4\24D1c'),`
`70`		`- (4, U&'\0061\0308\24D1c')) vals (num, val)`
	`76`	`+ (4, U&'\0061\0308\24D1c'),`
	`77`	`+ (5, '')) vals (num, val)`
`71`	`78`	`ORDER BY num;`
`72`	`79`	`num \| val \| nfc \| nfd \| nfkc \| nfkd`
`73`	`80`	`-----+-----+-----+-----+------+------`
`74`	`81`	`1 \| äbc \| t \| f \| t \| f`
`75`	`82`	`2 \| äbc \| f \| t \| f \| t`
`76`	`83`	`3 \| äⓑc \| t \| f \| f \| f`
`77`	`84`	`4 \| äⓑc \| f \| t \| f \| f`
`78`		`-(4 rows)`
	`85`	`+ 5 \| \| t \| t \| t \| t`
	`86`	`+(5 rows)`
`79`	`87`
`80`	`88`	`SELECT is_normalized('abc', 'def'); -- run-time error`
`81`	`89`	`ERROR: invalid normalization form: def`

`‎src/test/regress/sql/unicode.sql`

Lines changed: 3 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -5,6 +5,7 @@ SELECT getdatabaseencoding() <> 'UTF8' AS skip_test \gset`
`5`	`5`
`6`	`6`	`SELECT U&'\0061\0308bc'<> U&'\00E4bc' COLLATE"C"AS sanity_check;`
`7`	`7`
	`8`	`+SELECT normalize('');`
`8`	`9`	`SELECT normalize(U&'\0061\0308\24D1c')= U&'\00E4\24D1c' COLLATE"C"AS test_default;`
`9`	`10`	`SELECT normalize(U&'\0061\0308\24D1c', NFC)= U&'\00E4\24D1c' COLLATE"C"AS test_nfc;`
`10`	`11`	`SELECT normalize(U&'\00E4bc', NFC)= U&'\00E4bc' COLLATE"C"AS test_nfc_idem;`
`@@ -26,7 +27,8 @@ FROM`
`26`	`27`	`(VALUES (1, U&'\00E4bc'),`
`27`	`28`	`(2, U&'\0061\0308bc'),`
`28`	`29`	`(3, U&'\00E4\24D1c'),`
`29`		`- (4, U&'\0061\0308\24D1c')) vals (num, val)`
	`30`	`+ (4, U&'\0061\0308\24D1c'),`
	`31`	`+ (5,'')) vals (num, val)`
`30`	`32`	`ORDER BY num;`
`31`	`33`
`32`	`34`	`SELECT is_normalized('abc','def');-- run-time error`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit098c134

File tree

3 files changed

3 files changed

`‎src/common/unicode_norm.c`

`‎src/test/regress/expected/unicode.out`

`‎src/test/regress/sql/unicode.sql`

0 commit comments