Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commite782a63

Browse files
committed
Make escaping functions retain trailing bytes of an invalid character.
Instead of dropping the trailing byte(s) of an invalid or incompletemultibyte character, replace only the first byte with a known-invalidsequence, and process the rest normally. This seems less likely toconfuse incautious callers than the behavior adopted in5dc1e42.While we're at it, adjust PQescapeStringInternal to produce at mostone bleat about invalid multibyte characters per string. Thismatches the behavior of PQescapeInternal, and avoids the risk ofproducing tons of repetitive junk if a long string is simply givenin the wrong encoding.This is a followup to the fixes forCVE-2025-1094, and should beincluded if cherry-picking those fixes.Author: Andres Freund <andres@anarazel.de>Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>Reported-by: Jeff Davis <pgsql@j-davis.com>Discussion:https://postgr.es/m/20250215012712.45@rfd.leadboat.comBackpatch-through: 13
1 parent22ffbbf commite782a63

File tree

2 files changed

+67
-97
lines changed

2 files changed

+67
-97
lines changed

‎src/fe_utils/string_utils.c

Lines changed: 34 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -180,40 +180,25 @@ fmtIdEnc(const char *rawid, int encoding)
180180
/* Slow path for possible multibyte characters */
181181
charlen=pg_encoding_mblen(encoding,cp);
182182

183-
if (remaining<charlen)
184-
{
185-
/*
186-
* If the character is longer than the available input,
187-
* replace the string with an invalid sequence. The invalid
188-
* sequence ensures that the escaped string will trigger an
189-
* error on the server-side, even if we can't directly report
190-
* an error here.
191-
*/
192-
enlargePQExpBuffer(id_return,2);
193-
pg_encoding_set_invalid(encoding,
194-
id_return->data+id_return->len);
195-
id_return->len+=2;
196-
id_return->data[id_return->len]='\0';
197-
198-
/* there's no more input data, so we can stop */
199-
break;
200-
}
201-
elseif (pg_encoding_verifymbchar(encoding,cp,charlen)==-1)
183+
if (remaining<charlen||
184+
pg_encoding_verifymbchar(encoding,cp,charlen)==-1)
202185
{
203186
/*
204187
* Multibyte character is invalid. It's important to verify
205-
* that as invalidmulti-byte characters could e.g. be used to
188+
* that as invalidmultibyte characters could e.g. be used to
206189
* "skip" over quote characters, e.g. when parsing
207190
* character-by-character.
208191
*
209-
* Replace the bytes corresponding to the invalid character
210-
* with an invalid sequence, for the same reason as above.
192+
* Replace the character's first byte with an invalid
193+
* sequence. The invalid sequence ensures that the escaped
194+
* string will trigger an error on the server-side, even if we
195+
* can't directly report an error here.
211196
*
212197
* It would be a bit faster to verify the whole string the
213198
* first time we encounter a set highbit, but this way we can
214-
* replace just the invalidcharacters, which probably makes
215-
*iteasier for users to find the invalidly encoded portion
216-
*of alarger string.
199+
* replace just the invaliddata, which probably makes it
200+
* easier for users to find the invalidly encoded portion of a
201+
* larger string.
217202
*/
218203
enlargePQExpBuffer(id_return,2);
219204
pg_encoding_set_invalid(encoding,
@@ -222,11 +207,13 @@ fmtIdEnc(const char *rawid, int encoding)
222207
id_return->data[id_return->len]='\0';
223208

224209
/*
225-
* Copy the rest of the string after the invalid multi-byte
226-
* character.
210+
* Handle the following bytes as if this byte didn't exist.
211+
* That's safer in case the subsequent bytes contain
212+
* characters that are significant for the caller (e.g. '>' in
213+
* html).
227214
*/
228-
remaining-=charlen;
229-
cp+=charlen;
215+
remaining--;
216+
cp++;
230217
}
231218
else
232219
{
@@ -395,49 +382,39 @@ appendStringLiteral(PQExpBuffer buf, const char *str,
395382
/* Slow path for possible multibyte characters */
396383
charlen=PQmblen(source,encoding);
397384

398-
if (remaining<charlen)
399-
{
400-
/*
401-
* If the character is longer than the available input, replace
402-
* the string with an invalid sequence. The invalid sequence
403-
* ensures that the escaped string will trigger an error on the
404-
* server-side, even if we can't directly report an error here.
405-
*
406-
* We know there's enough space for the invalid sequence because
407-
* the "target" buffer is 2 * length + 2 long, and at worst we're
408-
* replacing a single input byte with two invalid bytes.
409-
*/
410-
pg_encoding_set_invalid(encoding,target);
411-
target+=2;
412-
413-
/* there's no more valid input data, so we can stop */
414-
break;
415-
}
416-
elseif (pg_encoding_verifymbchar(encoding,source,charlen)==-1)
385+
if (remaining<charlen||
386+
pg_encoding_verifymbchar(encoding,source,charlen)==-1)
417387
{
418388
/*
419389
* Multibyte character is invalid. It's important to verify that
420-
* as invalidmulti-byte characters could e.g. be used to "skip"
390+
* as invalidmultibyte characters could e.g. be used to "skip"
421391
* over quote characters, e.g. when parsing
422392
* character-by-character.
423393
*
424-
* Replace the bytes corresponding to the invalid character with
425-
* an invalid sequence, for the same reason as above.
394+
* Replace the character's first byte with an invalid sequence.
395+
* The invalid sequence ensures that the escaped string will
396+
* trigger an error on the server-side, even if we can't directly
397+
* report an error here.
398+
*
399+
* We know there's enough space for the invalid sequence because
400+
* the "target" buffer is 2 * length + 2 long, and at worst we're
401+
* replacing a single input byte with two invalid bytes.
426402
*
427403
* It would be a bit faster to verify the whole string the first
428404
* time we encounter a set highbit, but this way we can replace
429-
* just the invalidcharacters, which probably makes it easier for
430-
*usersto find the invalidly encoded portion of a larger string.
405+
* just the invaliddata, which probably makes it easier for users
406+
* to find the invalidly encoded portion of a larger string.
431407
*/
432408
pg_encoding_set_invalid(encoding,target);
433409
target+=2;
434-
remaining-=charlen;
435410

436411
/*
437-
* Copy the rest of the string after the invalid multi-byte
438-
* character.
412+
* Handle the following bytes as if this byte didn't exist. That's
413+
* safer in case the subsequent bytes contain important characters
414+
* for the caller (e.g. '>' in html).
439415
*/
440-
source+=charlen;
416+
source++;
417+
remaining--;
441418
}
442419
else
443420
{

‎src/interfaces/libpq/fe-exec.c

Lines changed: 33 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -3978,6 +3978,7 @@ PQescapeStringInternal(PGconn *conn,
39783978
constchar*source=from;
39793979
char*target=to;
39803980
size_tremaining=strnlen(from,length);
3981+
boolalready_complained= false;
39813982

39823983
if (error)
39833984
*error=0;
@@ -4004,67 +4005,59 @@ PQescapeStringInternal(PGconn *conn,
40044005
/* Slow path for possible multibyte characters */
40054006
charlen=pg_encoding_mblen(encoding,source);
40064007

4007-
if (remaining<charlen)
4008+
if (remaining<charlen||
4009+
pg_encoding_verifymbchar(encoding,source,charlen)==-1)
40084010
{
40094011
/*
4010-
* If the character is longer than the available input, report an
4011-
* error if possible, and replace the string with an invalid
4012-
* sequence. The invalid sequence ensures that the escaped string
4013-
* will trigger an error on the server-side, even if we can't
4014-
* directly report an error here.
4012+
* Multibyte character is invalid. It's important to verify that
4013+
* as invalid multibyte characters could e.g. be used to "skip"
4014+
* over quote characters, e.g. when parsing
4015+
* character-by-character.
4016+
*
4017+
* Report an error if possible, and replace the character's first
4018+
* byte with an invalid sequence. The invalid sequence ensures
4019+
* that the escaped string will trigger an error on the
4020+
* server-side, even if we can't directly report an error here.
40154021
*
40164022
* This isn't *that* crucial when we can report an error to the
4017-
* caller, but if we can't, the caller will use this string
4018-
* unmodified and it needs to be safe for parsing.
4023+
* caller; but if we can't or the caller ignores it, the caller
4024+
* will use this string unmodified and it needs to be safe for
4025+
* parsing.
40194026
*
40204027
* We know there's enough space for the invalid sequence because
40214028
* the "to" buffer needs to be at least 2 * length + 1 long, and
40224029
* at worst we're replacing a single input byte with two invalid
40234030
* bytes.
4024-
*/
4025-
if (error)
4026-
*error=1;
4027-
if (conn)
4028-
appendPQExpBufferStr(&conn->errorMessage,
4029-
libpq_gettext("incomplete multibyte character\n"));
4030-
4031-
pg_encoding_set_invalid(encoding,target);
4032-
target+=2;
4033-
4034-
/* there's no more input data, so we can stop */
4035-
break;
4036-
}
4037-
elseif (pg_encoding_verifymbchar(encoding,source,charlen)==-1)
4038-
{
4039-
/*
4040-
* Multibyte character is invalid. It's important to verify that
4041-
* as invalid multi-byte characters could e.g. be used to "skip"
4042-
* over quote characters, e.g. when parsing
4043-
* character-by-character.
4044-
*
4045-
* Replace the bytes corresponding to the invalid character with
4046-
* an invalid sequence, for the same reason as above.
40474031
*
40484032
* It would be a bit faster to verify the whole string the first
40494033
* time we encounter a set highbit, but this way we can replace
4050-
* just the invalidcharacters, which probably makes it easier for
4051-
*usersto find the invalidly encoded portion of a larger string.
4034+
* just the invaliddata, which probably makes it easier for users
4035+
* to find the invalidly encoded portion of a larger string.
40524036
*/
40534037
if (error)
40544038
*error=1;
4055-
if (conn)
4056-
appendPQExpBufferStr(&conn->errorMessage,
4057-
libpq_gettext("invalid multibyte character\n"));
4039+
if (conn&& !already_complained)
4040+
{
4041+
if (remaining<charlen)
4042+
appendPQExpBufferStr(&conn->errorMessage,
4043+
libpq_gettext("incomplete multibyte character"));
4044+
else
4045+
appendPQExpBufferStr(&conn->errorMessage,
4046+
libpq_gettext("invalid multibyte character"));
4047+
/* Issue a complaint only once per string */
4048+
already_complained= true;
4049+
}
40584050

40594051
pg_encoding_set_invalid(encoding,target);
40604052
target+=2;
4061-
remaining-=charlen;
40624053

40634054
/*
4064-
* Copy the rest of the string after the invalid multi-byte
4065-
* character.
4055+
* Handle the following bytes as if this byte didn't exist. That's
4056+
* safer in case the subsequent bytes contain important characters
4057+
* for the caller (e.g. '>' in html).
40664058
*/
4067-
source+=charlen;
4059+
source++;
4060+
remaining--;
40684061
}
40694062
else
40704063
{

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp