Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit981dc2b

Browse files
committed
Make ts_locale.c's character-type functions cope with UTF-16.
On Windows, in UTF8 database encoding, what char2wchar() produces isUTF16 not UTF32, ie, characters above U+FFFF will be represented bysurrogate pairs. t_isdigit() and siblings did not account for thisand failed to provide a large enough result buffer. That in turnled to bogus "invalid multibyte character for locale" errors, becausecontrary to what you might think from char2wchar()'s documentation,its Windows code path doesn't cope sanely with buffer overflow.The solution for t_isdigit() and siblings is pretty clear: providea 3-wchar_t result buffer not 2.char2wchar() also needs some work to provide more consistent, and moreaccurately documented, buffer overrun behavior. But that's a bigger joband it doesn't actually have any immediate payoff, so leave it for later.Per bug #15476 from Kenji Uno, who deserves credit for identifying thecause of the problem. Back-patch to all active branches.Discussion:https://postgr.es/m/15476-4314f480acf0f114@postgresql.org
1 parentdfa6081 commit981dc2b

File tree

1 file changed

+19
-8
lines changed

1 file changed

+19
-8
lines changed

‎src/backend/tsearch/ts_locale.c

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,18 +21,29 @@
2121
staticvoidtsearch_readline_callback(void*arg);
2222

2323

24+
/*
25+
* The reason these functions use a 3-wchar_t output buffer, not 2 as you
26+
* might expect, is that on Windows "wchar_t" is 16 bits and what we'll be
27+
* getting from char2wchar() is UTF16 not UTF32. A single input character
28+
* may therefore produce a surrogate pair rather than just one wchar_t;
29+
* we also need room for a trailing null. When we do get a surrogate pair,
30+
* we pass just the first code to iswdigit() etc, so that these functions will
31+
* always return false for characters outside the Basic Multilingual Plane.
32+
*/
33+
#defineWC_BUF_LEN 3
34+
2435
int
2536
t_isdigit(constchar*ptr)
2637
{
2738
intclen=pg_mblen(ptr);
28-
wchar_tcharacter[2];
39+
wchar_tcharacter[WC_BUF_LEN];
2940
Oidcollation=DEFAULT_COLLATION_OID;/* TODO */
3041
pg_locale_tmylocale=0;/* TODO */
3142

3243
if (clen==1||lc_ctype_is_c(collation))
3344
returnisdigit(TOUCHAR(ptr));
3445

35-
char2wchar(character,2,ptr,clen,mylocale);
46+
char2wchar(character,WC_BUF_LEN,ptr,clen,mylocale);
3647

3748
returniswdigit((wint_t)character[0]);
3849
}
@@ -41,14 +52,14 @@ int
4152
t_isspace(constchar*ptr)
4253
{
4354
intclen=pg_mblen(ptr);
44-
wchar_tcharacter[2];
55+
wchar_tcharacter[WC_BUF_LEN];
4556
Oidcollation=DEFAULT_COLLATION_OID;/* TODO */
4657
pg_locale_tmylocale=0;/* TODO */
4758

4859
if (clen==1||lc_ctype_is_c(collation))
4960
returnisspace(TOUCHAR(ptr));
5061

51-
char2wchar(character,2,ptr,clen,mylocale);
62+
char2wchar(character,WC_BUF_LEN,ptr,clen,mylocale);
5263

5364
returniswspace((wint_t)character[0]);
5465
}
@@ -57,14 +68,14 @@ int
5768
t_isalpha(constchar*ptr)
5869
{
5970
intclen=pg_mblen(ptr);
60-
wchar_tcharacter[2];
71+
wchar_tcharacter[WC_BUF_LEN];
6172
Oidcollation=DEFAULT_COLLATION_OID;/* TODO */
6273
pg_locale_tmylocale=0;/* TODO */
6374

6475
if (clen==1||lc_ctype_is_c(collation))
6576
returnisalpha(TOUCHAR(ptr));
6677

67-
char2wchar(character,2,ptr,clen,mylocale);
78+
char2wchar(character,WC_BUF_LEN,ptr,clen,mylocale);
6879

6980
returniswalpha((wint_t)character[0]);
7081
}
@@ -73,14 +84,14 @@ int
7384
t_isprint(constchar*ptr)
7485
{
7586
intclen=pg_mblen(ptr);
76-
wchar_tcharacter[2];
87+
wchar_tcharacter[WC_BUF_LEN];
7788
Oidcollation=DEFAULT_COLLATION_OID;/* TODO */
7889
pg_locale_tmylocale=0;/* TODO */
7990

8091
if (clen==1||lc_ctype_is_c(collation))
8192
returnisprint(TOUCHAR(ptr));
8293

83-
char2wchar(character,2,ptr,clen,mylocale);
94+
char2wchar(character,WC_BUF_LEN,ptr,clen,mylocale);
8495

8596
returniswprint((wint_t)character[0]);
8697
}

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp