Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitc083095

Browse files
committed
Make escaping functions retain trailing bytes of an invalid character.
Instead of dropping the trailing byte(s) of an invalid or incompletemultibyte character, replace only the first byte with a known-invalidsequence, and process the rest normally. This seems less likely toconfuse incautious callers than the behavior adopted in5dc1e42.While we're at it, adjust PQescapeStringInternal to produce at mostone bleat about invalid multibyte characters per string. Thismatches the behavior of PQescapeInternal, and avoids the risk ofproducing tons of repetitive junk if a long string is simply givenin the wrong encoding.This is a followup to the fixes forCVE-2025-1094, and should beincluded if cherry-picking those fixes.Author: Andres Freund <andres@anarazel.de>Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>Reported-by: Jeff Davis <pgsql@j-davis.com>Discussion:https://postgr.es/m/20250215012712.45@rfd.leadboat.comBackpatch-through: 13
1 parent985908d commitc083095

File tree

2 files changed

+67
-97
lines changed

2 files changed

+67
-97
lines changed

‎src/fe_utils/string_utils.c

Lines changed: 34 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -180,40 +180,25 @@ fmtIdEnc(const char *rawid, int encoding)
180180
/* Slow path for possible multibyte characters */
181181
charlen=pg_encoding_mblen(encoding,cp);
182182

183-
if (remaining<charlen)
184-
{
185-
/*
186-
* If the character is longer than the available input,
187-
* replace the string with an invalid sequence. The invalid
188-
* sequence ensures that the escaped string will trigger an
189-
* error on the server-side, even if we can't directly report
190-
* an error here.
191-
*/
192-
enlargePQExpBuffer(id_return,2);
193-
pg_encoding_set_invalid(encoding,
194-
id_return->data+id_return->len);
195-
id_return->len+=2;
196-
id_return->data[id_return->len]='\0';
197-
198-
/* there's no more input data, so we can stop */
199-
break;
200-
}
201-
elseif (pg_encoding_verifymbchar(encoding,cp,charlen)==-1)
183+
if (remaining<charlen||
184+
pg_encoding_verifymbchar(encoding,cp,charlen)==-1)
202185
{
203186
/*
204187
* Multibyte character is invalid. It's important to verify
205-
* that as invalidmulti-byte characters could e.g. be used to
188+
* that as invalidmultibyte characters could e.g. be used to
206189
* "skip" over quote characters, e.g. when parsing
207190
* character-by-character.
208191
*
209-
* Replace the bytes corresponding to the invalid character
210-
* with an invalid sequence, for the same reason as above.
192+
* Replace the character's first byte with an invalid
193+
* sequence. The invalid sequence ensures that the escaped
194+
* string will trigger an error on the server-side, even if we
195+
* can't directly report an error here.
211196
*
212197
* It would be a bit faster to verify the whole string the
213198
* first time we encounter a set highbit, but this way we can
214-
* replace just the invalidcharacters, which probably makes
215-
*iteasier for users to find the invalidly encoded portion
216-
*of alarger string.
199+
* replace just the invaliddata, which probably makes it
200+
* easier for users to find the invalidly encoded portion of a
201+
* larger string.
217202
*/
218203
enlargePQExpBuffer(id_return,2);
219204
pg_encoding_set_invalid(encoding,
@@ -222,11 +207,13 @@ fmtIdEnc(const char *rawid, int encoding)
222207
id_return->data[id_return->len]='\0';
223208

224209
/*
225-
* Copy the rest of the string after the invalid multi-byte
226-
* character.
210+
* Handle the following bytes as if this byte didn't exist.
211+
* That's safer in case the subsequent bytes contain
212+
* characters that are significant for the caller (e.g. '>' in
213+
* html).
227214
*/
228-
remaining-=charlen;
229-
cp+=charlen;
215+
remaining--;
216+
cp++;
230217
}
231218
else
232219
{
@@ -395,49 +382,39 @@ appendStringLiteral(PQExpBuffer buf, const char *str,
395382
/* Slow path for possible multibyte characters */
396383
charlen=PQmblen(source,encoding);
397384

398-
if (remaining<charlen)
399-
{
400-
/*
401-
* If the character is longer than the available input, replace
402-
* the string with an invalid sequence. The invalid sequence
403-
* ensures that the escaped string will trigger an error on the
404-
* server-side, even if we can't directly report an error here.
405-
*
406-
* We know there's enough space for the invalid sequence because
407-
* the "target" buffer is 2 * length + 2 long, and at worst we're
408-
* replacing a single input byte with two invalid bytes.
409-
*/
410-
pg_encoding_set_invalid(encoding,target);
411-
target+=2;
412-
413-
/* there's no more valid input data, so we can stop */
414-
break;
415-
}
416-
elseif (pg_encoding_verifymbchar(encoding,source,charlen)==-1)
385+
if (remaining<charlen||
386+
pg_encoding_verifymbchar(encoding,source,charlen)==-1)
417387
{
418388
/*
419389
* Multibyte character is invalid. It's important to verify that
420-
* as invalidmulti-byte characters could e.g. be used to "skip"
390+
* as invalidmultibyte characters could e.g. be used to "skip"
421391
* over quote characters, e.g. when parsing
422392
* character-by-character.
423393
*
424-
* Replace the bytes corresponding to the invalid character with
425-
* an invalid sequence, for the same reason as above.
394+
* Replace the character's first byte with an invalid sequence.
395+
* The invalid sequence ensures that the escaped string will
396+
* trigger an error on the server-side, even if we can't directly
397+
* report an error here.
398+
*
399+
* We know there's enough space for the invalid sequence because
400+
* the "target" buffer is 2 * length + 2 long, and at worst we're
401+
* replacing a single input byte with two invalid bytes.
426402
*
427403
* It would be a bit faster to verify the whole string the first
428404
* time we encounter a set highbit, but this way we can replace
429-
* just the invalidcharacters, which probably makes it easier for
430-
*usersto find the invalidly encoded portion of a larger string.
405+
* just the invaliddata, which probably makes it easier for users
406+
* to find the invalidly encoded portion of a larger string.
431407
*/
432408
pg_encoding_set_invalid(encoding,target);
433409
target+=2;
434-
remaining-=charlen;
435410

436411
/*
437-
* Copy the rest of the string after the invalid multi-byte
438-
* character.
412+
* Handle the following bytes as if this byte didn't exist. That's
413+
* safer in case the subsequent bytes contain important characters
414+
* for the caller (e.g. '>' in html).
439415
*/
440-
source+=charlen;
416+
source++;
417+
remaining--;
441418
}
442419
else
443420
{

‎src/interfaces/libpq/fe-exec.c

Lines changed: 33 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -3955,6 +3955,7 @@ PQescapeStringInternal(PGconn *conn,
39553955
constchar*source=from;
39563956
char*target=to;
39573957
size_tremaining=strnlen(from,length);
3958+
boolalready_complained= false;
39583959

39593960
if (error)
39603961
*error=0;
@@ -3981,67 +3982,59 @@ PQescapeStringInternal(PGconn *conn,
39813982
/* Slow path for possible multibyte characters */
39823983
charlen=pg_encoding_mblen(encoding,source);
39833984

3984-
if (remaining<charlen)
3985+
if (remaining<charlen||
3986+
pg_encoding_verifymbchar(encoding,source,charlen)==-1)
39853987
{
39863988
/*
3987-
* If the character is longer than the available input, report an
3988-
* error if possible, and replace the string with an invalid
3989-
* sequence. The invalid sequence ensures that the escaped string
3990-
* will trigger an error on the server-side, even if we can't
3991-
* directly report an error here.
3989+
* Multibyte character is invalid. It's important to verify that
3990+
* as invalid multibyte characters could e.g. be used to "skip"
3991+
* over quote characters, e.g. when parsing
3992+
* character-by-character.
3993+
*
3994+
* Report an error if possible, and replace the character's first
3995+
* byte with an invalid sequence. The invalid sequence ensures
3996+
* that the escaped string will trigger an error on the
3997+
* server-side, even if we can't directly report an error here.
39923998
*
39933999
* This isn't *that* crucial when we can report an error to the
3994-
* caller, but if we can't, the caller will use this string
3995-
* unmodified and it needs to be safe for parsing.
4000+
* caller; but if we can't or the caller ignores it, the caller
4001+
* will use this string unmodified and it needs to be safe for
4002+
* parsing.
39964003
*
39974004
* We know there's enough space for the invalid sequence because
39984005
* the "to" buffer needs to be at least 2 * length + 1 long, and
39994006
* at worst we're replacing a single input byte with two invalid
40004007
* bytes.
4001-
*/
4002-
if (error)
4003-
*error=1;
4004-
if (conn)
4005-
appendPQExpBufferStr(&conn->errorMessage,
4006-
libpq_gettext("incomplete multibyte character\n"));
4007-
4008-
pg_encoding_set_invalid(encoding,target);
4009-
target+=2;
4010-
4011-
/* there's no more input data, so we can stop */
4012-
break;
4013-
}
4014-
elseif (pg_encoding_verifymbchar(encoding,source,charlen)==-1)
4015-
{
4016-
/*
4017-
* Multibyte character is invalid. It's important to verify that
4018-
* as invalid multi-byte characters could e.g. be used to "skip"
4019-
* over quote characters, e.g. when parsing
4020-
* character-by-character.
4021-
*
4022-
* Replace the bytes corresponding to the invalid character with
4023-
* an invalid sequence, for the same reason as above.
40244008
*
40254009
* It would be a bit faster to verify the whole string the first
40264010
* time we encounter a set highbit, but this way we can replace
4027-
* just the invalidcharacters, which probably makes it easier for
4028-
*usersto find the invalidly encoded portion of a larger string.
4011+
* just the invaliddata, which probably makes it easier for users
4012+
* to find the invalidly encoded portion of a larger string.
40294013
*/
40304014
if (error)
40314015
*error=1;
4032-
if (conn)
4033-
appendPQExpBufferStr(&conn->errorMessage,
4034-
libpq_gettext("invalid multibyte character\n"));
4016+
if (conn&& !already_complained)
4017+
{
4018+
if (remaining<charlen)
4019+
appendPQExpBufferStr(&conn->errorMessage,
4020+
libpq_gettext("incomplete multibyte character"));
4021+
else
4022+
appendPQExpBufferStr(&conn->errorMessage,
4023+
libpq_gettext("invalid multibyte character"));
4024+
/* Issue a complaint only once per string */
4025+
already_complained= true;
4026+
}
40354027

40364028
pg_encoding_set_invalid(encoding,target);
40374029
target+=2;
4038-
remaining-=charlen;
40394030

40404031
/*
4041-
* Copy the rest of the string after the invalid multi-byte
4042-
* character.
4032+
* Handle the following bytes as if this byte didn't exist. That's
4033+
* safer in case the subsequent bytes contain important characters
4034+
* for the caller (e.g. '>' in html).
40434035
*/
4044-
source+=charlen;
4036+
source++;
4037+
remaining--;
40454038
}
40464039
else
40474040
{

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp