NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commitdfc7977

committed

Fix two issues in TOAST decompression.

pglz_maximum_compressed_size() potentially underestimated the amountof compressed data required to produce N bytes of decompressed data;this is a fault in commit11a078c.Separately from that, pglz_decompress() failed to protect itselfagainst corrupt compressed data, particularly off == 0 in a matchtag. Commitc60e520 turned such a situation into an infinite loop,where before it'd just have resulted in garbage output.The combination of these two bugs seems like it may explain bug #16694from Tom Vijlbrief, though it's impossible to be quite sure withoutdirect inspection of the failing session. (One needs to assume thatthe pglz_maximum_compressed_size() bug caused us to fail to fetch thesecond byte of a match tag, and what happened to be there instead wasa zero. The reported infinite loop is hard to explain without off == 0,though.)Aside from fixing the bugs, rewrite associated comments for moreclarity.Back-patch to v13 where both these commits landed.Discussion:https://postgr.es/m/16694-f107871e499ec114@postgresql.org

1 parent7f42350 commitdfc7977Copy full SHA for dfc7977

File tree

1 file changed

+66

-35

lines changed

src/common
- pg_lzcompress.c

1 file changed

+66

-35

lines changed

`‎src/common/pg_lzcompress.c`

Lines changed: 66 additions & 35 deletions

Original file line number	Diff line number	Diff line change
`@@ -710,7 +710,6 @@ pglz_decompress(const char source, int32 slen, char dest,`
`710`	`710`
`711`	`711`	`for (ctrlc=0;ctrlc<8&&sp<srcend&&dp<destend;ctrlc++)`
`712`	`712`	`{`
`713`		`-`
`714`	`713`	`if (ctrl&1)`
`715`	`714`	`{`
`716`	`715`	`/*`
`@@ -732,41 +731,62 @@ pglz_decompress(const char source, int32 slen, char dest,`
`732`	`731`	`len+=*sp++;`
`733`	`732`
`734`	`733`	`/*`
`735`		`- * Now we copy the bytes specified by the tag from OUTPUT to`
`736`		`- * OUTPUT (copy len bytes from dp - off to dp). The copied`
`737`		`- * areas could overlap, to preven possible uncertainty, we`
`738`		`- * copy only non-overlapping regions.`
	`734`	`+ * Check for corrupt data: if we fell off the end of the`
	`735`	`+ * source, or if we obtained off = 0, we have problems. (We`
	`736`	`+ * must check this, else we risk an infinite loop below in the`
	`737`	`+ * face of corrupt data.)`
	`738`	`+ */`
	`739`	`+if (sp>srcend\|\|off==0)`
	`740`	`+break;`
	`741`	`+`
	`742`	`+/*`
	`743`	`+ * Don't emit more data than requested.`
`739`	`744`	`*/`
`740`	`745`	`len=Min(len,destend-dp);`
	`746`	`+`
	`747`	`+/*`
	`748`	`+ * Now we copy the bytes specified by the tag from OUTPUT to`
	`749`	`+ * OUTPUT (copy len bytes from dp - off to dp). The copied`
	`750`	`+ * areas could overlap, so to avoid undefined behavior in`
	`751`	`+ * memcpy(), be careful to copy only non-overlapping regions.`
	`752`	`+ *`
	`753`	`+ * Note that we cannot use memmove() instead, since while its`
	`754`	`+ * behavior is well-defined, it's also not what we want.`
	`755`	`+ */`
`741`	`756`	`while (off<len)`
`742`	`757`	`{`
`743`		`-/*---------`
`744`		`- * When offset is smaller than length - source and`
`745`		`- * destination regions overlap. memmove() is resolving`
`746`		`- * this overlap in an incompatible way with pglz. Thus we`
`747`		`- * resort to memcpy()-ing non-overlapping regions.`
`748`		`- *`
`749`		`- * Consider input: 112341234123412341234`
`750`		`- * At byte 5 here ^ we have match with length 16 and`
`751`		`- * offset 4. 11234M(len=16, off=4)`
`752`		`- * We are decoding first period of match and rewrite match`
`753`		`- * 112341234M(len=12, off=8)`
`754`		`- *`
`755`		`- * The same match is now at position 9, it points to the`
`756`		`- * same start byte of output, but from another position:`
`757`		`- * the offset is doubled.`
`758`		`- *`
`759`		`- * We iterate through this offset growth until we can`
`760`		`- * proceed to usual memcpy(). If we would try to decode`
`761`		`- * the match at byte 5 (len=16, off=4) by memmove() we`
`762`		`- * would issue memmove(5, 1, 16) which would produce`
`763`		`- * 112341234XXXXXXXXXXXX, where series of X is 12`
`764`		`- * undefined bytes, that were at bytes [5:17].`
`765`		`- *---------`
	`758`	`+/*`
	`759`	`+ * We can safely copy "off" bytes since that clearly`
	`760`	`+ * results in non-overlapping source and destination.`
`766`	`761`	`*/`
`767`	`762`	`memcpy(dp,dp-off,off);`
`768`	`763`	`len-=off;`
`769`	`764`	`dp+=off;`
	`765`	`+`
	`766`	`+/*----------`
	`767`	`+ * This bit is less obvious: we can double "off" after`
	`768`	`+ * each such step. Consider this raw input:`
	`769`	`+ *112341234123412341234`
	`770`	`+ * This will be encoded as 5 literal bytes "11234" and`
	`771`	`+ * then a match tag with length 16 and offset 4. After`
	`772`	`+ * memcpy'ing the first 4 bytes, we will have emitted`
	`773`	`+ *112341234`
	`774`	`+ * so we can double "off" to 8, then after the next step`
	`775`	`+ * we have emitted`
	`776`	`+ *11234123412341234`
	`777`	`+ * Then we can double "off" again, after which it is more`
	`778`	`+ * than the remaining "len" so we fall out of this loop`
	`779`	`+ * and finish with a non-overlapping copy of the`
	`780`	`+ * remainder. In general, a match tag with off < len`
	`781`	`+ * implies that the decoded data has a repeat length of`
	`782`	`+ * "off". We can handle 1, 2, 4, etc repetitions of the`
	`783`	`+ * repeated string per memcpy until we get to a situation`
	`784`	`+ * where the final copy step is non-overlapping.`
	`785`	`+ *`
	`786`	`+ * (Another way to understand this is that we are keeping`
	`787`	`+ * the copy source point dp - off the same throughout.)`
	`788`	`+ *----------`
	`789`	`+ */`
`770`	`790`	`off+=off;`
`771`	`791`	`}`
`772`	`792`	`memcpy(dp,dp-off,len);`
`@@ -820,21 +840,32 @@ pglz_decompress(const char source, int32 slen, char dest,`
`820`	`840`	`int32`
`821`	`841`	`pglz_maximum_compressed_size(int32rawsize,int32total_compressed_size)`
`822`	`842`	`{`
`823`		`-int32compressed_size;`
	`843`	`+int64compressed_size;`
`824`	`844`
`825`	`845`	`/*`
`826`		`- * pglz uses one control bit per byte, sowe need (rawsize * 9) bits. We`
`827`		`- *care about bytes though, so we add 7 to make sure we include the last`
`828`		`- *incomplete byte (integer division roundsdown).`
	`846`	`+ * pglz uses one control bit per byte, soif the entire desired prefix is`
	`847`	`+ represented as literal bytes, we'll need (rawsize 9) bits. We care`
	`848`	`+ *about bytes though, so be sure to round up notdown.`
`829`	`849`	`*`
`830`		`- * XXX Use int64 to prevent overflow during calculation.`
	`850`	`+ * Use int64 here to prevent overflow during calculation.`
	`851`	`+ */`
	`852`	`+compressed_size= ((int64)rawsize*9+7) /8;`
	`853`	`+`
	`854`	`+/*`
	`855`	`+ * The above fails to account for a corner case: we could have compressed`
	`856`	`+ * data that starts with N-1 or N-2 literal bytes and then has a match tag`
	`857`	`+ * of 2 or 3 bytes. It's therefore possible that we need to fetch 1 or 2`
	`858`	`+ * more bytes in order to have the whole match tag. (Match tags earlier`
	`859`	`+ * in the compressed data don't cause a problem, since they should`
	`860`	`+ * represent more decompressed bytes than they occupy themselves.)`
`831`	`861`	`*/`
`832`		`-compressed_size= (int32) ((int64)rawsize*9+7) /8;`
	`862`	`+compressed_size+=2;`
`833`	`863`
`834`	`864`	`/*`
`835`	`865`	`* Maximum compressed size can't be larger than total compressed size.`
	`866`	`+ * (This also ensures that our result fits in int32.)`
`836`	`867`	`*/`
`837`	`868`	`compressed_size=Min(compressed_size,total_compressed_size);`
`838`	`869`
`839`		`-returncompressed_size;`
	`870`	`+return(int32)compressed_size;`
`840`	`871`	`}`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitdfc7977

File tree

1 file changed

1 file changed

`‎src/common/pg_lzcompress.c`

0 commit comments