Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit3f2ab73

Browse files
committed
With GB18030, prevent SIGSEGV from reading past end of allocation.
With GB18030 as source encoding, applications could crash the server viaSQL functions convert() or convert_from(). Applications themselvescould crash after passing unterminated GB18030 input to libpq functionsPQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), orPQescapeString(). Extension code could crash by passing unterminatedGB18030 input to jsonapi.h functions. All those functions have beenintended to handle untrusted, unterminated input safely.A crash required allocating the input such that the last byte of theallocation was the last byte of a virtual memory page. Some malloc()implementations take measures against that, making the SIGSEGV hard toreach. Back-patch to v13 (all supported versions).Author: Noah Misch <noah@leadboat.com>Author: Andres Freund <andres@anarazel.de>Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>Backpatch-through: 13Security:CVE-2025-4207
1 parent258cde8 commit3f2ab73

File tree

9 files changed

+188
-30
lines changed

9 files changed

+188
-30
lines changed

‎src/backend/utils/mb/mbutils.c

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1030,7 +1030,7 @@ pg_mbcliplen(const char *mbstr, int len, int limit)
10301030
}
10311031

10321032
/*
1033-
* pg_mbcliplen with specified encoding
1033+
* pg_mbcliplen with specified encoding; string must be valid in encoding
10341034
*/
10351035
int
10361036
pg_encoding_mbcliplen(intencoding,constchar*mbstr,
@@ -1641,12 +1641,12 @@ check_encoding_conversion_args(int src_encoding,
16411641
* report_invalid_encoding: complain about invalid multibyte character
16421642
*
16431643
* note: len is remaining length of string, not length of character;
1644-
* len must be greater than zero, as we always examine the first byte.
1644+
* len must be greater than zero (or we'd neglect initializing "buf").
16451645
*/
16461646
void
16471647
report_invalid_encoding(intencoding,constchar*mbstr,intlen)
16481648
{
1649-
intl=pg_encoding_mblen(encoding,mbstr);
1649+
intl=pg_encoding_mblen_or_incomplete(encoding,mbstr,len);
16501650
charbuf[8*5+1];
16511651
char*p=buf;
16521652
intj,
@@ -1673,18 +1673,26 @@ report_invalid_encoding(int encoding, const char *mbstr, int len)
16731673
* report_untranslatable_char: complain about untranslatable character
16741674
*
16751675
* note: len is remaining length of string, not length of character;
1676-
* len must be greater than zero, as we always examine the first byte.
1676+
* len must be greater than zero (or we'd neglect initializing "buf").
16771677
*/
16781678
void
16791679
report_untranslatable_char(intsrc_encoding,intdest_encoding,
16801680
constchar*mbstr,intlen)
16811681
{
1682-
intl=pg_encoding_mblen(src_encoding,mbstr);
1682+
intl;
16831683
charbuf[8*5+1];
16841684
char*p=buf;
16851685
intj,
16861686
jlimit;
16871687

1688+
/*
1689+
* We probably could use plain pg_encoding_mblen(), because
1690+
* gb18030_to_utf8() verifies before it converts. All conversions should.
1691+
* For src_encoding!=GB18030, len>0 meets pg_encoding_mblen() needs. Even
1692+
* so, be defensive, since a buggy conversion might pass invalid data.
1693+
* This is not a performance-critical path.
1694+
*/
1695+
l=pg_encoding_mblen_or_incomplete(src_encoding,mbstr,len);
16881696
jlimit=Min(l,len);
16891697
jlimit=Min(jlimit,8);/* prevent buffer overrun */
16901698

‎src/common/jsonapi.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -700,8 +700,11 @@ json_lex_string(JsonLexContext *lex)
700700
} while (0)
701701
#defineFAIL_AT_CHAR_END(code) \
702702
do { \
703-
char *term = s + pg_encoding_mblen(lex->input_encoding, s); \
704-
lex->token_terminator = (term <= end) ? term : end; \
703+
ptrdiff_tremaining = end - s; \
704+
intcharlen; \
705+
charlen = pg_encoding_mblen_or_incomplete(lex->input_encoding, \
706+
s, remaining); \
707+
lex->token_terminator = (charlen <= remaining) ? s + charlen : end; \
705708
return code; \
706709
} while (0)
707710

‎src/common/wchar.c

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
*/
1313
#include"c.h"
1414

15+
#include<limits.h>
16+
1517
#include"mb/pg_wchar.h"
1618

1719

@@ -1959,10 +1961,27 @@ const pg_wchar_tbl pg_wchar_table[] = {
19591961
/*
19601962
* Returns the byte length of a multibyte character.
19611963
*
1962-
* Caution: when dealing with text that is not certainly valid in the
1963-
* specified encoding, the result may exceed the actual remaining
1964-
* string length. Callers that are not prepared to deal with that
1965-
* should use pg_encoding_mblen_bounded() instead.
1964+
* Choose "mblen" functions based on the input string characteristics.
1965+
* pg_encoding_mblen() can be used when ANY of these conditions are met:
1966+
*
1967+
* - The input string is zero-terminated
1968+
*
1969+
* - The input string is known to be valid in the encoding (e.g., string
1970+
* converted from database encoding)
1971+
*
1972+
* - The encoding is not GB18030 (e.g., when only database encodings are
1973+
* passed to 'encoding' parameter)
1974+
*
1975+
* encoding==GB18030 requires examining up to two bytes to determine character
1976+
* length. Therefore, callers satisfying none of those conditions must use
1977+
* pg_encoding_mblen_or_incomplete() instead, as access to mbstr[1] cannot be
1978+
* guaranteed to be within allocation bounds.
1979+
*
1980+
* When dealing with text that is not certainly valid in the specified
1981+
* encoding, the result may exceed the actual remaining string length.
1982+
* Callers that are not prepared to deal with that should use Min(remaining,
1983+
* pg_encoding_mblen_or_incomplete()). For zero-terminated strings, that and
1984+
* pg_encoding_mblen_bounded() are interchangeable.
19661985
*/
19671986
int
19681987
pg_encoding_mblen(intencoding,constchar*mbstr)
@@ -1973,8 +1992,28 @@ pg_encoding_mblen(int encoding, const char *mbstr)
19731992
}
19741993

19751994
/*
1976-
* Returns the byte length of a multibyte character; but not more than
1977-
* the distance to end of string.
1995+
* Returns the byte length of a multibyte character (possibly not
1996+
* zero-terminated), or INT_MAX if too few bytes remain to determine a length.
1997+
*/
1998+
int
1999+
pg_encoding_mblen_or_incomplete(intencoding,constchar*mbstr,
2000+
size_tremaining)
2001+
{
2002+
/*
2003+
* Define zero remaining as too few, even for single-byte encodings.
2004+
* pg_gb18030_mblen() reads one or two bytes; single-byte encodings read
2005+
* zero; others read one.
2006+
*/
2007+
if (remaining<1||
2008+
(encoding==PG_GB18030&&IS_HIGHBIT_SET(*mbstr)&&remaining<2))
2009+
returnINT_MAX;
2010+
returnpg_encoding_mblen(encoding,mbstr);
2011+
}
2012+
2013+
/*
2014+
* Returns the byte length of a multibyte character; but not more than the
2015+
* distance to the terminating zero byte. For input that might lack a
2016+
* terminating zero, use Min(remaining, pg_encoding_mblen_or_incomplete()).
19782017
*/
19792018
int
19802019
pg_encoding_mblen_bounded(intencoding,constchar*mbstr)

‎src/include/mb/pg_wchar.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -575,6 +575,8 @@ extern intpg_valid_server_encoding_id(int encoding);
575575
*/
576576
externvoidpg_encoding_set_invalid(intencoding,char*dst);
577577
externintpg_encoding_mblen(intencoding,constchar*mbstr);
578+
externintpg_encoding_mblen_or_incomplete(intencoding,constchar*mbstr,
579+
size_tremaining);
578580
externintpg_encoding_mblen_bounded(intencoding,constchar*mbstr);
579581
externintpg_encoding_dsplen(intencoding,constchar*mbstr);
580582
externintpg_encoding_verifymbchar(intencoding,constchar*mbstr,intlen);

‎src/interfaces/libpq/fe-exec.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3980,7 +3980,8 @@ PQescapeStringInternal(PGconn *conn,
39803980
}
39813981

39823982
/* Slow path for possible multibyte characters */
3983-
charlen=pg_encoding_mblen(encoding,source);
3983+
charlen=pg_encoding_mblen_or_incomplete(encoding,
3984+
source,remaining);
39843985

39853986
if (remaining<charlen||
39863987
pg_encoding_verifymbchar(encoding,source,charlen)==-1)
@@ -4124,7 +4125,8 @@ PQescapeInternal(PGconn *conn, const char *str, size_t len, bool as_ident)
41244125
intcharlen;
41254126

41264127
/* Slow path for possible multibyte characters */
4127-
charlen=pg_encoding_mblen(conn->client_encoding,s);
4128+
charlen=pg_encoding_mblen_or_incomplete(conn->client_encoding,
4129+
s,remaining);
41284130

41294131
if (charlen>remaining)
41304132
{

‎src/interfaces/libpq/fe-misc.c

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1180,13 +1180,9 @@ pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
11801180
*/
11811181

11821182
/*
1183-
* Returns the byte length of the character beginning at s, using the
1184-
* specified encoding.
1185-
*
1186-
* Caution: when dealing with text that is not certainly valid in the
1187-
* specified encoding, the result may exceed the actual remaining
1188-
* string length. Callers that are not prepared to deal with that
1189-
* should use PQmblenBounded() instead.
1183+
* Like pg_encoding_mblen(). Use this in callers that want the
1184+
* dynamically-linked libpq's stance on encodings, even if that means
1185+
* different behavior in different startups of the executable.
11901186
*/
11911187
int
11921188
PQmblen(constchar*s,intencoding)
@@ -1195,8 +1191,9 @@ PQmblen(const char *s, int encoding)
11951191
}
11961192

11971193
/*
1198-
* Returns the byte length of the character beginning at s, using the
1199-
* specified encoding; but not more than the distance to end of string.
1194+
* Like pg_encoding_mblen_bounded(). Use this in callers that want the
1195+
* dynamically-linked libpq's stance on encodings, even if that means
1196+
* different behavior in different startups of the executable.
12001197
*/
12011198
int
12021199
PQmblenBounded(constchar*s,intencoding)

‎src/test/modules/test_escape/test_escape.c

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
#include<string.h>
1313
#include<stdio.h>
1414

15+
#include"common/jsonapi.h"
1516
#include"fe_utils/psqlscan.h"
1617
#include"fe_utils/string_utils.h"
1718
#include"getopt_long.h"
@@ -164,6 +165,91 @@ encoding_conflicts_ascii(int encoding)
164165
}
165166

166167

168+
/*
169+
* Confirm escaping doesn't read past the end of an allocation. Consider the
170+
* result of malloc(4096), in the absence of freelist entries satisfying the
171+
* allocation. On OpenBSD, reading one byte past the end of that object
172+
* yields SIGSEGV.
173+
*
174+
* Run this test before the program's other tests, so freelists are minimal.
175+
* len=4096 didn't SIGSEGV, likely due to free() calls in libpq. len=8192
176+
* did. Use 128 KiB, to somewhat insulate the outcome from distant new free()
177+
* calls and libc changes.
178+
*/
179+
staticvoid
180+
test_gb18030_page_multiple(pe_test_config*tc)
181+
{
182+
PQExpBuffertestname;
183+
size_tinput_len=0x20000;
184+
char*input;
185+
186+
/* prepare input */
187+
input=pg_malloc(input_len);
188+
memset(input,'-',input_len-1);
189+
input[input_len-1]=0xfe;
190+
191+
/* name to describe the test */
192+
testname=createPQExpBuffer();
193+
appendPQExpBuffer(testname,">repeat(%c, %zu)",input[0],input_len-1);
194+
escapify(testname,input+input_len-1,1);
195+
appendPQExpBuffer(testname,"< - GB18030 - PQescapeLiteral");
196+
197+
/* test itself */
198+
PQsetClientEncoding(tc->conn,"GB18030");
199+
report_result(tc,PQescapeLiteral(tc->conn,input,input_len)==NULL,
200+
testname->data,"",
201+
"input validity vs escape success","ok");
202+
203+
destroyPQExpBuffer(testname);
204+
pg_free(input);
205+
}
206+
207+
/*
208+
* Confirm json parsing doesn't read past the end of an allocation. This
209+
* exercises wchar.c infrastructure like the true "escape" tests do, but this
210+
* isn't an "escape" test.
211+
*/
212+
staticvoid
213+
test_gb18030_json(pe_test_config*tc)
214+
{
215+
PQExpBufferraw_buf;
216+
PQExpBuffertestname;
217+
constcharinput[]="{\"\\u\xFE";
218+
size_tinput_len=sizeof(input)-1;
219+
JsonLexContext*lex;
220+
JsonSemActionsem= {0};/* no callbacks */
221+
JsonParseErrorTypejson_error;
222+
char*error_str;
223+
224+
/* prepare input like test_one_vector_escape() does */
225+
raw_buf=createPQExpBuffer();
226+
appendBinaryPQExpBuffer(raw_buf,input,input_len);
227+
appendPQExpBufferStr(raw_buf,NEVER_ACCESS_STR);
228+
VALGRIND_MAKE_MEM_NOACCESS(&raw_buf->data[input_len],
229+
raw_buf->len-input_len);
230+
231+
/* name to describe the test */
232+
testname=createPQExpBuffer();
233+
appendPQExpBuffer(testname,">");
234+
escapify(testname,input,input_len);
235+
appendPQExpBuffer(testname,"< - GB18030 - pg_parse_json");
236+
237+
/* test itself */
238+
lex=makeJsonLexContextCstringLen(raw_buf->data,input_len,
239+
PG_GB18030, false);
240+
json_error=pg_parse_json(lex,&sem);
241+
error_str=psprintf("JsonParseErrorType %d",json_error);
242+
report_result(tc,json_error==JSON_UNICODE_ESCAPE_FORMAT,
243+
testname->data,"",
244+
"diagnosed",error_str);
245+
246+
pfree(error_str);
247+
pfree(lex);
248+
destroyPQExpBuffer(testname);
249+
destroyPQExpBuffer(raw_buf);
250+
}
251+
252+
167253
staticbool
168254
escape_literal(PGconn*conn,PQExpBuffertarget,
169255
constchar*unescaped,size_tunescaped_len,
@@ -454,8 +540,18 @@ static pe_test_vector pe_test_vectors[] =
454540
* Testcases that are not null terminated for the specified input length.
455541
* That's interesting to verify that escape functions don't read beyond
456542
* the intended input length.
543+
*
544+
* One interesting special case is GB18030, which has the odd behaviour
545+
* needing to read beyond the first byte to determine the length of a
546+
* multi-byte character.
457547
*/
458548
TV_LEN("gbk","\x80",1),
549+
TV_LEN("GB18030","\x80",1),
550+
TV_LEN("GB18030","\x80\0",2),
551+
TV_LEN("GB18030","\x80\x30",2),
552+
TV_LEN("GB18030","\x80\x30\0",3),
553+
TV_LEN("GB18030","\x80\x30\x30",3),
554+
TV_LEN("GB18030","\x80\x30\x30\0",4),
459555
TV_LEN("UTF-8","\xC3\xb6 ",1),
460556
TV_LEN("UTF-8","\xC3\xb6 ",2),
461557
};
@@ -864,6 +960,9 @@ main(int argc, char *argv[])
864960
exit(1);
865961
}
866962

963+
test_gb18030_page_multiple(&tc);
964+
test_gb18030_json(&tc);
965+
867966
for (inti=0;i<lengthof(pe_test_vectors);i++)
868967
{
869968
test_one_vector(&tc,&pe_test_vectors[i]);

‎src/test/regress/expected/conversion.out

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -329,10 +329,13 @@ insert into gb18030_inputs values
329329
('\x666f6f84309c38','valid, translates to UTF-8 by mapping function'),
330330
('\x666f6f84309c','incomplete char '),
331331
('\x666f6f84309c0a','incomplete char, followed by newline '),
332+
('\x666f6f84','incomplete char at end'),
332333
('\x666f6f84309c3800', 'invalid, NUL byte'),
333334
('\x666f6f84309c0038', 'invalid, NUL byte');
334-
-- Test GB18030 verification
335-
select description, inbytes, (test_conv(inbytes, 'gb18030', 'gb18030')).* from gb18030_inputs;
335+
-- Test GB18030 verification. Round-trip through text so the backing of the
336+
-- bytea values is palloc, not shared_buffers. This lets Valgrind detect
337+
-- reads past the end.
338+
select description, inbytes, (test_conv(inbytes::text::bytea, 'gb18030', 'gb18030')).* from gb18030_inputs;
336339
description | inbytes | result | errorat | error
337340
------------------------------------------------+--------------------+------------------+--------------+-------------------------------------------------------------------
338341
valid, pure ASCII | \x666f6f | \x666f6f | |
@@ -341,9 +344,10 @@ select description, inbytes, (test_conv(inbytes, 'gb18030', 'gb18030')).* from g
341344
valid, translates to UTF-8 by mapping function | \x666f6f84309c38 | \x666f6f84309c38 | |
342345
incomplete char | \x666f6f84309c | \x666f6f | \x84309c | invalid byte sequence for encoding "GB18030": 0x84 0x30 0x9c
343346
incomplete char, followed by newline | \x666f6f84309c0a | \x666f6f | \x84309c0a | invalid byte sequence for encoding "GB18030": 0x84 0x30 0x9c 0x0a
347+
incomplete char at end | \x666f6f84 | \x666f6f | \x84 | invalid byte sequence for encoding "GB18030": 0x84
344348
invalid, NUL byte | \x666f6f84309c3800 | \x666f6f84309c38 | \x00 | invalid byte sequence for encoding "GB18030": 0x00
345349
invalid, NUL byte | \x666f6f84309c0038 | \x666f6f | \x84309c0038 | invalid byte sequence for encoding "GB18030": 0x84 0x30 0x9c 0x00
346-
(8 rows)
350+
(9 rows)
347351

348352
-- Test conversions from GB18030
349353
select description, inbytes, (test_conv(inbytes, 'gb18030', 'utf8')).* from gb18030_inputs;
@@ -355,9 +359,10 @@ select description, inbytes, (test_conv(inbytes, 'gb18030', 'utf8')).* from gb18
355359
valid, translates to UTF-8 by mapping function | \x666f6f84309c38 | \x666f6fefa8aa | |
356360
incomplete char | \x666f6f84309c | \x666f6f | \x84309c | invalid byte sequence for encoding "GB18030": 0x84 0x30 0x9c
357361
incomplete char, followed by newline | \x666f6f84309c0a | \x666f6f | \x84309c0a | invalid byte sequence for encoding "GB18030": 0x84 0x30 0x9c 0x0a
362+
incomplete char at end | \x666f6f84 | \x666f6f | \x84 | invalid byte sequence for encoding "GB18030": 0x84
358363
invalid, NUL byte | \x666f6f84309c3800 | \x666f6fefa8aa | \x00 | invalid byte sequence for encoding "GB18030": 0x00
359364
invalid, NUL byte | \x666f6f84309c0038 | \x666f6f | \x84309c0038 | invalid byte sequence for encoding "GB18030": 0x84 0x30 0x9c 0x00
360-
(8 rows)
365+
(9 rows)
361366

362367
--
363368
-- ISO-8859-5

‎src/test/regress/sql/conversion.sql

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -154,11 +154,14 @@ insert into gb18030_inputs values
154154
('\x666f6f84309c38','valid, translates to UTF-8 by mapping function'),
155155
('\x666f6f84309c','incomplete char'),
156156
('\x666f6f84309c0a','incomplete char, followed by newline'),
157+
('\x666f6f84','incomplete char at end'),
157158
('\x666f6f84309c3800','invalid, NUL byte'),
158159
('\x666f6f84309c0038','invalid, NUL byte');
159160

160-
-- Test GB18030 verification
161-
select description, inbytes, (test_conv(inbytes,'gb18030','gb18030')).*from gb18030_inputs;
161+
-- Test GB18030 verification. Round-trip through text so the backing of the
162+
-- bytea values is palloc, not shared_buffers. This lets Valgrind detect
163+
-- reads past the end.
164+
select description, inbytes, (test_conv(inbytes::text::bytea,'gb18030','gb18030')).*from gb18030_inputs;
162165
-- Test conversions from GB18030
163166
select description, inbytes, (test_conv(inbytes,'gb18030','utf8')).*from gb18030_inputs;
164167

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp