NotificationsYou must be signed in to change notification settings
Fork4.9k
Star17.7k

Commitc083095

committed

Make escaping functions retain trailing bytes of an invalid character.

Instead of dropping the trailing byte(s) of an invalid or incompletemultibyte character, replace only the first byte with a known-invalidsequence, and process the rest normally. This seems less likely toconfuse incautious callers than the behavior adopted in5dc1e42.While we're at it, adjust PQescapeStringInternal to produce at mostone bleat about invalid multibyte characters per string. Thismatches the behavior of PQescapeInternal, and avoids the risk ofproducing tons of repetitive junk if a long string is simply givenin the wrong encoding.This is a followup to the fixes forCVE-2025-1094, and should beincluded if cherry-picking those fixes.Author: Andres Freund <andres@anarazel.de>Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>Reported-by: Jeff Davis <pgsql@j-davis.com>Discussion:https://postgr.es/m/20250215012712.45@rfd.leadboat.comBackpatch-through: 13

1 parent985908d commitc083095Copy full SHA for c083095

File tree

2 files changed

+67

-97

lines changed

src
- fe_utils
  - string_utils.c
- interfaces/libpq
  - fe-exec.c

2 files changed

+67

-97

lines changed

`‎src/fe_utils/string_utils.c`

Lines changed: 34 additions & 57 deletions

Original file line number	Diff line number	Diff line change
`@@ -180,40 +180,25 @@ fmtIdEnc(const char *rawid, int encoding)`
`180`	`180`	`/* Slow path for possible multibyte characters */`
`181`	`181`	`charlen=pg_encoding_mblen(encoding,cp);`
`182`	`182`
`183`		`-if (remaining<charlen)`
`184`		`-{`
`185`		`-/*`
`186`		`- * If the character is longer than the available input,`
`187`		`- * replace the string with an invalid sequence. The invalid`
`188`		`- * sequence ensures that the escaped string will trigger an`
`189`		`- * error on the server-side, even if we can't directly report`
`190`		`- * an error here.`
`191`		`- */`
`192`		`-enlargePQExpBuffer(id_return,2);`
`193`		`-pg_encoding_set_invalid(encoding,`
`194`		`-id_return->data+id_return->len);`
`195`		`-id_return->len+=2;`
`196`		`-id_return->data[id_return->len]='\0';`
`197`		`-`
`198`		`-/* there's no more input data, so we can stop */`
`199`		`-break;`
`200`		`-}`
`201`		`-elseif (pg_encoding_verifymbchar(encoding,cp,charlen)==-1)`
	`183`	`+if (remaining<charlen\|\|`
	`184`	`+pg_encoding_verifymbchar(encoding,cp,charlen)==-1)`
`202`	`185`	`{`
`203`	`186`	`/*`
`204`	`187`	`* Multibyte character is invalid. It's important to verify`
`205`		`- * that as invalidmulti-byte characters could e.g. be used to`
	`188`	`+ * that as invalidmultibyte characters could e.g. be used to`
`206`	`189`	`* "skip" over quote characters, e.g. when parsing`
`207`	`190`	`* character-by-character.`
`208`	`191`	`*`
`209`		`- * Replace the bytes corresponding to the invalid character`
`210`		`- * with an invalid sequence, for the same reason as above.`
	`192`	`+ * Replace the character's first byte with an invalid`
	`193`	`+ * sequence. The invalid sequence ensures that the escaped`
	`194`	`+ * string will trigger an error on the server-side, even if we`
	`195`	`+ * can't directly report an error here.`
`211`	`196`	`*`
`212`	`197`	`* It would be a bit faster to verify the whole string the`
`213`	`198`	`* first time we encounter a set highbit, but this way we can`
`214`		`- * replace just the invalidcharacters, which probably makes`
`215`		`- *iteasier for users to find the invalidly encoded portion`
`216`		`- *of alarger string.`
	`199`	`+ * replace just the invaliddata, which probably makes it`
	`200`	`+ * easier for users to find the invalidly encoded portion of a`
	`201`	`+ * larger string.`
`217`	`202`	`*/`
`218`	`203`	`enlargePQExpBuffer(id_return,2);`
`219`	`204`	`pg_encoding_set_invalid(encoding,`
`@@ -222,11 +207,13 @@ fmtIdEnc(const char *rawid, int encoding)`
`222`	`207`	`id_return->data[id_return->len]='\0';`
`223`	`208`
`224`	`209`	`/*`
`225`		`- * Copy the rest of the string after the invalid multi-byte`
`226`		`- * character.`
	`210`	`+ * Handle the following bytes as if this byte didn't exist.`
	`211`	`+ * That's safer in case the subsequent bytes contain`
	`212`	`+ * characters that are significant for the caller (e.g. '>' in`
	`213`	`+ * html).`
`227`	`214`	`*/`
`228`		`-remaining-=charlen;`
`229`		`-cp+=charlen;`
	`215`	`+remaining--;`
	`216`	`+cp++;`
`230`	`217`	`}`
`231`	`218`	`else`
`232`	`219`	`{`
`@@ -395,49 +382,39 @@ appendStringLiteral(PQExpBuffer buf, const char *str,`
`395`	`382`	`/* Slow path for possible multibyte characters */`
`396`	`383`	`charlen=PQmblen(source,encoding);`
`397`	`384`
`398`		`-if (remaining<charlen)`
`399`		`-{`
`400`		`-/*`
`401`		`- * If the character is longer than the available input, replace`
`402`		`- * the string with an invalid sequence. The invalid sequence`
`403`		`- * ensures that the escaped string will trigger an error on the`
`404`		`- * server-side, even if we can't directly report an error here.`
`405`		`- *`
`406`		`- * We know there's enough space for the invalid sequence because`
`407`		`- * the "target" buffer is 2 * length + 2 long, and at worst we're`
`408`		`- * replacing a single input byte with two invalid bytes.`
`409`		`- */`
`410`		`-pg_encoding_set_invalid(encoding,target);`
`411`		`-target+=2;`
`412`		`-`
`413`		`-/* there's no more valid input data, so we can stop */`
`414`		`-break;`
`415`		`-}`
`416`		`-elseif (pg_encoding_verifymbchar(encoding,source,charlen)==-1)`
	`385`	`+if (remaining<charlen\|\|`
	`386`	`+pg_encoding_verifymbchar(encoding,source,charlen)==-1)`
`417`	`387`	`{`
`418`	`388`	`/*`
`419`	`389`	`* Multibyte character is invalid. It's important to verify that`
`420`		`- * as invalidmulti-byte characters could e.g. be used to "skip"`
	`390`	`+ * as invalidmultibyte characters could e.g. be used to "skip"`
`421`	`391`	`* over quote characters, e.g. when parsing`
`422`	`392`	`* character-by-character.`
`423`	`393`	`*`
`424`		`- * Replace the bytes corresponding to the invalid character with`
`425`		`- * an invalid sequence, for the same reason as above.`
	`394`	`+ * Replace the character's first byte with an invalid sequence.`
	`395`	`+ * The invalid sequence ensures that the escaped string will`
	`396`	`+ * trigger an error on the server-side, even if we can't directly`
	`397`	`+ * report an error here.`
	`398`	`+ *`
	`399`	`+ * We know there's enough space for the invalid sequence because`
	`400`	`+ * the "target" buffer is 2 * length + 2 long, and at worst we're`
	`401`	`+ * replacing a single input byte with two invalid bytes.`
`426`	`402`	`*`
`427`	`403`	`* It would be a bit faster to verify the whole string the first`
`428`	`404`	`* time we encounter a set highbit, but this way we can replace`
`429`		`- * just the invalidcharacters, which probably makes it easier for`
`430`		`- *usersto find the invalidly encoded portion of a larger string.`
	`405`	`+ * just the invaliddata, which probably makes it easier for users`
	`406`	`+ * to find the invalidly encoded portion of a larger string.`
`431`	`407`	`*/`
`432`	`408`	`pg_encoding_set_invalid(encoding,target);`
`433`	`409`	`target+=2;`
`434`		`-remaining-=charlen;`
`435`	`410`
`436`	`411`	`/*`
`437`		`- * Copy the rest of the string after the invalid multi-byte`
`438`		`- * character.`
	`412`	`+ * Handle the following bytes as if this byte didn't exist. That's`
	`413`	`+ * safer in case the subsequent bytes contain important characters`
	`414`	`+ * for the caller (e.g. '>' in html).`
`439`	`415`	`*/`
`440`		`-source+=charlen;`
	`416`	`+source++;`
	`417`	`+remaining--;`
`441`	`418`	`}`
`442`	`419`	`else`
`443`	`420`	`{`

`‎src/interfaces/libpq/fe-exec.c`

Lines changed: 33 additions & 40 deletions

Original file line number	Diff line number	Diff line change
`@@ -3955,6 +3955,7 @@ PQescapeStringInternal(PGconn *conn,`
`3955`	`3955`	`constchar*source=from;`
`3956`	`3956`	`char*target=to;`
`3957`	`3957`	`size_tremaining=strnlen(from,length);`
	`3958`	`+boolalready_complained= false;`
`3958`	`3959`
`3959`	`3960`	`if (error)`
`3960`	`3961`	`*error=0;`
`@@ -3981,67 +3982,59 @@ PQescapeStringInternal(PGconn *conn,`
`3981`	`3982`	`/* Slow path for possible multibyte characters */`
`3982`	`3983`	`charlen=pg_encoding_mblen(encoding,source);`
`3983`	`3984`
`3984`		`-if (remaining<charlen)`
	`3985`	`+if (remaining<charlen\|\|`
	`3986`	`+pg_encoding_verifymbchar(encoding,source,charlen)==-1)`
`3985`	`3987`	`{`
`3986`	`3988`	`/*`
`3987`		`- * If the character is longer than the available input, report an`
`3988`		`- * error if possible, and replace the string with an invalid`
`3989`		`- * sequence. The invalid sequence ensures that the escaped string`
`3990`		`- * will trigger an error on the server-side, even if we can't`
`3991`		`- * directly report an error here.`
	`3989`	`+ * Multibyte character is invalid. It's important to verify that`
	`3990`	`+ * as invalid multibyte characters could e.g. be used to "skip"`
	`3991`	`+ * over quote characters, e.g. when parsing`
	`3992`	`+ * character-by-character.`
	`3993`	`+ *`
	`3994`	`+ * Report an error if possible, and replace the character's first`
	`3995`	`+ * byte with an invalid sequence. The invalid sequence ensures`
	`3996`	`+ * that the escaped string will trigger an error on the`
	`3997`	`+ * server-side, even if we can't directly report an error here.`
`3992`	`3998`	`*`
`3993`	`3999`	`* This isn't that crucial when we can report an error to the`
`3994`		`- * caller, but if we can't, the caller will use this string`
`3995`		`- * unmodified and it needs to be safe for parsing.`
	`4000`	`+ * caller; but if we can't or the caller ignores it, the caller`
	`4001`	`+ * will use this string unmodified and it needs to be safe for`
	`4002`	`+ * parsing.`
`3996`	`4003`	`*`
`3997`	`4004`	`* We know there's enough space for the invalid sequence because`
`3998`	`4005`	`* the "to" buffer needs to be at least 2 * length + 1 long, and`
`3999`	`4006`	`* at worst we're replacing a single input byte with two invalid`
`4000`	`4007`	`* bytes.`
`4001`		`- */`
`4002`		`-if (error)`
`4003`		`-*error=1;`
`4004`		`-if (conn)`
`4005`		`-appendPQExpBufferStr(&conn->errorMessage,`
`4006`		`-libpq_gettext("incomplete multibyte character\n"));`
`4007`		`-`
`4008`		`-pg_encoding_set_invalid(encoding,target);`
`4009`		`-target+=2;`
`4010`		`-`
`4011`		`-/* there's no more input data, so we can stop */`
`4012`		`-break;`
`4013`		`-}`
`4014`		`-elseif (pg_encoding_verifymbchar(encoding,source,charlen)==-1)`
`4015`		`-{`
`4016`		`-/*`
`4017`		`- * Multibyte character is invalid. It's important to verify that`
`4018`		`- * as invalid multi-byte characters could e.g. be used to "skip"`
`4019`		`- * over quote characters, e.g. when parsing`
`4020`		`- * character-by-character.`
`4021`		`- *`
`4022`		`- * Replace the bytes corresponding to the invalid character with`
`4023`		`- * an invalid sequence, for the same reason as above.`
`4024`	`4008`	`*`
`4025`	`4009`	`* It would be a bit faster to verify the whole string the first`
`4026`	`4010`	`* time we encounter a set highbit, but this way we can replace`
`4027`		`- * just the invalidcharacters, which probably makes it easier for`
`4028`		`- *usersto find the invalidly encoded portion of a larger string.`
	`4011`	`+ * just the invaliddata, which probably makes it easier for users`
	`4012`	`+ * to find the invalidly encoded portion of a larger string.`
`4029`	`4013`	`*/`
`4030`	`4014`	`if (error)`
`4031`	`4015`	`*error=1;`
`4032`		`-if (conn)`
`4033`		`-appendPQExpBufferStr(&conn->errorMessage,`
`4034`		`-libpq_gettext("invalid multibyte character\n"));`
	`4016`	`+if (conn&& !already_complained)`
	`4017`	`+{`
	`4018`	`+if (remaining<charlen)`
	`4019`	`+appendPQExpBufferStr(&conn->errorMessage,`
	`4020`	`+libpq_gettext("incomplete multibyte character"));`
	`4021`	`+else`
	`4022`	`+appendPQExpBufferStr(&conn->errorMessage,`
	`4023`	`+libpq_gettext("invalid multibyte character"));`
	`4024`	`+/* Issue a complaint only once per string */`
	`4025`	`+already_complained= true;`
	`4026`	`+}`
`4035`	`4027`
`4036`	`4028`	`pg_encoding_set_invalid(encoding,target);`
`4037`	`4029`	`target+=2;`
`4038`		`-remaining-=charlen;`
`4039`	`4030`
`4040`	`4031`	`/*`
`4041`		`- * Copy the rest of the string after the invalid multi-byte`
`4042`		`- * character.`
	`4032`	`+ * Handle the following bytes as if this byte didn't exist. That's`
	`4033`	`+ * safer in case the subsequent bytes contain important characters`
	`4034`	`+ * for the caller (e.g. '>' in html).`
`4043`	`4035`	`*/`
`4044`		`-source+=charlen;`
	`4036`	`+source++;`
	`4037`	`+remaining--;`
`4045`	`4038`	`}`
`4046`	`4039`	`else`
`4047`	`4040`	`{`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitc083095

File tree

2 files changed

2 files changed

`‎src/fe_utils/string_utils.c`

`‎src/interfaces/libpq/fe-exec.c`

0 commit comments