Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitdfc7977

Browse files
committed
Fix two issues in TOAST decompression.
pglz_maximum_compressed_size() potentially underestimated the amountof compressed data required to produce N bytes of decompressed data;this is a fault in commit11a078c.Separately from that, pglz_decompress() failed to protect itselfagainst corrupt compressed data, particularly off == 0 in a matchtag. Commitc60e520 turned such a situation into an infinite loop,where before it'd just have resulted in garbage output.The combination of these two bugs seems like it may explain bug #16694from Tom Vijlbrief, though it's impossible to be quite sure withoutdirect inspection of the failing session. (One needs to assume thatthe pglz_maximum_compressed_size() bug caused us to fail to fetch thesecond byte of a match tag, and what happened to be there instead wasa zero. The reported infinite loop is hard to explain without off == 0,though.)Aside from fixing the bugs, rewrite associated comments for moreclarity.Back-patch to v13 where both these commits landed.Discussion:https://postgr.es/m/16694-f107871e499ec114@postgresql.org
1 parent7f42350 commitdfc7977

File tree

1 file changed

+66
-35
lines changed

1 file changed

+66
-35
lines changed

‎src/common/pg_lzcompress.c

Lines changed: 66 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -710,7 +710,6 @@ pglz_decompress(const char *source, int32 slen, char *dest,
710710

711711
for (ctrlc=0;ctrlc<8&&sp<srcend&&dp<destend;ctrlc++)
712712
{
713-
714713
if (ctrl&1)
715714
{
716715
/*
@@ -732,41 +731,62 @@ pglz_decompress(const char *source, int32 slen, char *dest,
732731
len+=*sp++;
733732

734733
/*
735-
* Now we copy the bytes specified by the tag from OUTPUT to
736-
* OUTPUT (copy len bytes from dp - off to dp). The copied
737-
* areas could overlap, to preven possible uncertainty, we
738-
* copy only non-overlapping regions.
734+
* Check for corrupt data: if we fell off the end of the
735+
* source, or if we obtained off = 0, we have problems. (We
736+
* must check this, else we risk an infinite loop below in the
737+
* face of corrupt data.)
738+
*/
739+
if (sp>srcend||off==0)
740+
break;
741+
742+
/*
743+
* Don't emit more data than requested.
739744
*/
740745
len=Min(len,destend-dp);
746+
747+
/*
748+
* Now we copy the bytes specified by the tag from OUTPUT to
749+
* OUTPUT (copy len bytes from dp - off to dp). The copied
750+
* areas could overlap, so to avoid undefined behavior in
751+
* memcpy(), be careful to copy only non-overlapping regions.
752+
*
753+
* Note that we cannot use memmove() instead, since while its
754+
* behavior is well-defined, it's also not what we want.
755+
*/
741756
while (off<len)
742757
{
743-
/*---------
744-
* When offset is smaller than length - source and
745-
* destination regions overlap. memmove() is resolving
746-
* this overlap in an incompatible way with pglz. Thus we
747-
* resort to memcpy()-ing non-overlapping regions.
748-
*
749-
* Consider input: 112341234123412341234
750-
* At byte 5 here ^ we have match with length 16 and
751-
* offset 4. 11234M(len=16, off=4)
752-
* We are decoding first period of match and rewrite match
753-
* 112341234M(len=12, off=8)
754-
*
755-
* The same match is now at position 9, it points to the
756-
* same start byte of output, but from another position:
757-
* the offset is doubled.
758-
*
759-
* We iterate through this offset growth until we can
760-
* proceed to usual memcpy(). If we would try to decode
761-
* the match at byte 5 (len=16, off=4) by memmove() we
762-
* would issue memmove(5, 1, 16) which would produce
763-
* 112341234XXXXXXXXXXXX, where series of X is 12
764-
* undefined bytes, that were at bytes [5:17].
765-
*---------
758+
/*
759+
* We can safely copy "off" bytes since that clearly
760+
* results in non-overlapping source and destination.
766761
*/
767762
memcpy(dp,dp-off,off);
768763
len-=off;
769764
dp+=off;
765+
766+
/*----------
767+
* This bit is less obvious: we can double "off" after
768+
* each such step. Consider this raw input:
769+
*112341234123412341234
770+
* This will be encoded as 5 literal bytes "11234" and
771+
* then a match tag with length 16 and offset 4. After
772+
* memcpy'ing the first 4 bytes, we will have emitted
773+
*112341234
774+
* so we can double "off" to 8, then after the next step
775+
* we have emitted
776+
*11234123412341234
777+
* Then we can double "off" again, after which it is more
778+
* than the remaining "len" so we fall out of this loop
779+
* and finish with a non-overlapping copy of the
780+
* remainder. In general, a match tag with off < len
781+
* implies that the decoded data has a repeat length of
782+
* "off". We can handle 1, 2, 4, etc repetitions of the
783+
* repeated string per memcpy until we get to a situation
784+
* where the final copy step is non-overlapping.
785+
*
786+
* (Another way to understand this is that we are keeping
787+
* the copy source point dp - off the same throughout.)
788+
*----------
789+
*/
770790
off+=off;
771791
}
772792
memcpy(dp,dp-off,len);
@@ -820,21 +840,32 @@ pglz_decompress(const char *source, int32 slen, char *dest,
820840
int32
821841
pglz_maximum_compressed_size(int32rawsize,int32total_compressed_size)
822842
{
823-
int32compressed_size;
843+
int64compressed_size;
824844

825845
/*
826-
* pglz uses one control bit per byte, sowe need (rawsize * 9) bits. We
827-
*care about bytes though, so we add 7 to make sure we include the last
828-
*incomplete byte (integer division roundsdown).
846+
* pglz uses one control bit per byte, soif the entire desired prefix is
847+
*represented as literal bytes, we'll need (rawsize * 9) bits. We care
848+
*about bytes though, so be sure to round up notdown.
829849
*
830-
* XXX Use int64 to prevent overflow during calculation.
850+
* Use int64 here to prevent overflow during calculation.
851+
*/
852+
compressed_size= ((int64)rawsize*9+7) /8;
853+
854+
/*
855+
* The above fails to account for a corner case: we could have compressed
856+
* data that starts with N-1 or N-2 literal bytes and then has a match tag
857+
* of 2 or 3 bytes. It's therefore possible that we need to fetch 1 or 2
858+
* more bytes in order to have the whole match tag. (Match tags earlier
859+
* in the compressed data don't cause a problem, since they should
860+
* represent more decompressed bytes than they occupy themselves.)
831861
*/
832-
compressed_size= (int32) ((int64)rawsize*9+7) /8;
862+
compressed_size+=2;
833863

834864
/*
835865
* Maximum compressed size can't be larger than total compressed size.
866+
* (This also ensures that our result fits in int32.)
836867
*/
837868
compressed_size=Min(compressed_size,total_compressed_size);
838869

839-
returncompressed_size;
870+
return(int32)compressed_size;
840871
}

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp