Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit57aa5b2

Browse files
committed
Add GUC to enable compression of full page images stored in WAL.
When newly-added GUC parameter, wal_compression, is on, the PostgreSQL servercompresses a full page image written to WAL when full_page_writes is on orduring a base backup. A compressed page image will be decompressed during WALreplay. Turning this parameter on can reduce the WAL volume without increasingthe risk of unrecoverable data corruption, but at the cost of some extra CPUspent on the compression during WAL logging and on the decompression duringWAL replay.This commit changes the WAL format (so bumping WAL version number) so thatthe one-byte flag indicating whether a full page image is compressed or not isincluded in its header information. This means that the commit increases theWAL volume one-byte per a full page image even if WAL compression is not usedat all. We can save that one-byte by borrowing one-bit from the existing fieldlike hole_offset in the header and using it as the flag, for example. But whichwould reduce the code readability and the extensibility of the feature.Per discussion, it's not worth paying those prices to save only one-byte, so wedecided to add the one-byte flag to the header.This commit doesn't introduce any new compression algorithm like lz4.Currently a full page image is compressed using the existing PGLZ algorithm.Per discussion, we decided to use it at least in the first version of thefeature because there were no performance reports showing that its compressionratio is unacceptably lower than that of other algorithm. Of course,in the future, it's worth considering the support of other compressionalgorithm for the better compression.Rahila Syed and Michael Paquier, reviewed in various versions by myself,Andres Freund, Robert Haas, Abhijit Menon-Sen and many others.
1 parent2fbb286 commit57aa5b2

File tree

11 files changed

+320
-39
lines changed

11 files changed

+320
-39
lines changed

‎contrib/pg_xlogdump/pg_xlogdump.c

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -359,18 +359,17 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
359359
rec_len=XLogRecGetDataLen(record)+SizeOfXLogRecord;
360360

361361
/*
362-
* Calculate the amount of FPI data in the record. Each backup block
363-
* takes up BLCKSZ bytes, minus the "hole" length.
362+
* Calculate the amount of FPI data in the record.
364363
*
365364
* XXX: We peek into xlogreader's private decoded backup blocks for the
366-
*hole_length. It doesn't seem worth it to add an accessor macro for
367-
* this.
365+
*bimg_len indicating the length of FPI data. It doesn't seem worth it to
366+
*add an accessor macro forthis.
368367
*/
369368
fpi_len=0;
370369
for (block_id=0;block_id <=record->max_block_id;block_id++)
371370
{
372371
if (XLogRecHasBlockImage(record,block_id))
373-
fpi_len+=BLCKSZ-record->blocks[block_id].hole_length;
372+
fpi_len+=record->blocks[block_id].bimg_len;
374373
}
375374

376375
/* Update per-rmgr statistics */
@@ -465,9 +464,22 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
465464
blk);
466465
if (XLogRecHasBlockImage(record,block_id))
467466
{
468-
printf(" (FPW); hole: offset: %u, length: %u\n",
469-
record->blocks[block_id].hole_offset,
470-
record->blocks[block_id].hole_length);
467+
if (record->blocks[block_id].bimg_info&
468+
BKPIMAGE_IS_COMPRESSED)
469+
{
470+
printf(" (FPW); hole: offset: %u, length: %u, compression saved: %u\n",
471+
record->blocks[block_id].hole_offset,
472+
record->blocks[block_id].hole_length,
473+
BLCKSZ-
474+
record->blocks[block_id].hole_length-
475+
record->blocks[block_id].bimg_len);
476+
}
477+
else
478+
{
479+
printf(" (FPW); hole: offset: %u, length: %u\n",
480+
record->blocks[block_id].hole_offset,
481+
record->blocks[block_id].hole_length);
482+
}
471483
}
472484
putchar('\n');
473485
}

‎doc/src/sgml/config.sgml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2282,6 +2282,30 @@ include_dir 'conf.d'
22822282
</listitem>
22832283
</varlistentry>
22842284

2285+
<varlistentry id="guc-wal-compression" xreflabel="wal_compression">
2286+
<term><varname>wal_compression</varname> (<type>boolean</type>)
2287+
<indexterm>
2288+
<primary><varname>wal_compression</> configuration parameter</primary>
2289+
</indexterm>
2290+
</term>
2291+
<listitem>
2292+
<para>
2293+
When this parameter is <literal>on</>, the <productname>PostgreSQL</>
2294+
server compresses a full page image written to WAL when
2295+
<xref linkend="guc-full-page-writes"> is on or during a base backup.
2296+
A compressed page image will be decompressed during WAL replay.
2297+
The default value is <literal>off</>.
2298+
</para>
2299+
2300+
<para>
2301+
Turning this parameter on can reduce the WAL volume without
2302+
increasing the risk of unrecoverable data corruption,
2303+
but at the cost of some extra CPU spent on the compression during
2304+
WAL logging and on the decompression during WAL replay.
2305+
</para>
2306+
</listitem>
2307+
</varlistentry>
2308+
22852309
<varlistentry id="guc-wal-buffers" xreflabel="wal_buffers">
22862310
<term><varname>wal_buffers</varname> (<type>integer</type>)
22872311
<indexterm>

‎src/backend/access/transam/xlog.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ char *XLogArchiveCommand = NULL;
8989
boolEnableHotStandby= false;
9090
boolfullPageWrites= true;
9191
boolwal_log_hints= false;
92+
boolwal_compression= false;
9293
boollog_checkpoints= false;
9394
intsync_method=DEFAULT_SYNC_METHOD;
9495
intwal_level=WAL_LEVEL_MINIMAL;

‎src/backend/access/transam/xloginsert.c

Lines changed: 122 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,16 @@
2424
#include"access/xlog_internal.h"
2525
#include"access/xloginsert.h"
2626
#include"catalog/pg_control.h"
27+
#include"common/pg_lzcompress.h"
2728
#include"miscadmin.h"
2829
#include"storage/bufmgr.h"
2930
#include"storage/proc.h"
3031
#include"utils/memutils.h"
3132
#include"pg_trace.h"
3233

34+
/* Buffer size required to store a compressed version of backup block image */
35+
#definePGLZ_MAX_BLCKSZPGLZ_MAX_OUTPUT(BLCKSZ)
36+
3337
/*
3438
* For each block reference registered with XLogRegisterBuffer, we fill in
3539
* a registered_buffer struct.
@@ -50,6 +54,9 @@ typedef struct
5054

5155
XLogRecDatabkp_rdatas[2];/* temporary rdatas used to hold references to
5256
* backup block data in XLogRecordAssemble() */
57+
58+
/* buffer to store a compressed version of backup block image */
59+
charcompressed_page[PGLZ_MAX_BLCKSZ];
5360
}registered_buffer;
5461

5562
staticregistered_buffer*registered_buffers;
@@ -96,6 +103,8 @@ static MemoryContext xloginsert_cxt;
96103
staticXLogRecData*XLogRecordAssemble(RmgrIdrmid,uint8info,
97104
XLogRecPtrRedoRecPtr,booldoPageWrites,
98105
XLogRecPtr*fpw_lsn);
106+
staticboolXLogCompressBackupBlock(char*page,uint16hole_offset,
107+
uint16hole_length,char*dest,uint16*dlen);
99108

100109
/*
101110
* Begin constructing a WAL record. This must be called before the
@@ -482,7 +491,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
482491
boolneeds_data;
483492
XLogRecordBlockHeaderbkpb;
484493
XLogRecordBlockImageHeaderbimg;
494+
XLogRecordBlockCompressHeadercbimg;
485495
boolsamerel;
496+
boolis_compressed= false;
497+
uint16hole_length;
498+
uint16hole_offset;
486499

487500
if (!regbuf->in_use)
488501
continue;
@@ -529,9 +542,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
529542
if (needs_backup)
530543
{
531544
Pagepage=regbuf->page;
545+
uint16compressed_len;
532546

533547
/*
534-
* The page needs to be backed up, so set up *bimg
548+
* The page needs to be backed up, so calculate its hole length
549+
* and offset.
535550
*/
536551
if (regbuf->flags&REGBUF_STANDARD)
537552
{
@@ -543,50 +558,81 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
543558
upper>lower&&
544559
upper <=BLCKSZ)
545560
{
546-
bimg.hole_offset=lower;
547-
bimg.hole_length=upper-lower;
561+
hole_offset=lower;
562+
hole_length=upper-lower;
548563
}
549564
else
550565
{
551566
/* No "hole" to compress out */
552-
bimg.hole_offset=0;
553-
bimg.hole_length=0;
567+
hole_offset=0;
568+
hole_length=0;
554569
}
555570
}
556571
else
557572
{
558573
/* Not a standard page header, don't try to eliminate "hole" */
559-
bimg.hole_offset=0;
560-
bimg.hole_length=0;
574+
hole_offset=0;
575+
hole_length=0;
576+
}
577+
578+
/*
579+
* Try to compress a block image if wal_compression is enabled
580+
*/
581+
if (wal_compression)
582+
{
583+
is_compressed=
584+
XLogCompressBackupBlock(page,hole_offset,hole_length,
585+
regbuf->compressed_page,
586+
&compressed_len);
561587
}
562588

563589
/* Fill in the remaining fields in the XLogRecordBlockHeader struct */
564590
bkpb.fork_flags |=BKPBLOCK_HAS_IMAGE;
565591

566-
total_len+=BLCKSZ-bimg.hole_length;
567-
568592
/*
569593
* Construct XLogRecData entries for the page content.
570594
*/
571595
rdt_datas_last->next=&regbuf->bkp_rdatas[0];
572596
rdt_datas_last=rdt_datas_last->next;
573-
if (bimg.hole_length==0)
597+
598+
bimg.bimg_info= (hole_length==0) ?0 :BKPIMAGE_HAS_HOLE;
599+
600+
if (is_compressed)
574601
{
575-
rdt_datas_last->data=page;
576-
rdt_datas_last->len=BLCKSZ;
602+
bimg.length=compressed_len;
603+
bimg.hole_offset=hole_offset;
604+
bimg.bimg_info |=BKPIMAGE_IS_COMPRESSED;
605+
if (hole_length!=0)
606+
cbimg.hole_length=hole_length;
607+
608+
rdt_datas_last->data=regbuf->compressed_page;
609+
rdt_datas_last->len=compressed_len;
577610
}
578611
else
579612
{
580-
/* must skip the hole */
581-
rdt_datas_last->data=page;
582-
rdt_datas_last->len=bimg.hole_offset;
613+
bimg.length=BLCKSZ-hole_length;
614+
bimg.hole_offset=hole_offset;
583615

584-
rdt_datas_last->next=&regbuf->bkp_rdatas[1];
585-
rdt_datas_last=rdt_datas_last->next;
616+
if (hole_length==0)
617+
{
618+
rdt_datas_last->data=page;
619+
rdt_datas_last->len=BLCKSZ;
620+
}
621+
else
622+
{
623+
/* must skip the hole */
624+
rdt_datas_last->data=page;
625+
rdt_datas_last->len=hole_offset;
586626

587-
rdt_datas_last->data=page+ (bimg.hole_offset+bimg.hole_length);
588-
rdt_datas_last->len=BLCKSZ- (bimg.hole_offset+bimg.hole_length);
627+
rdt_datas_last->next=&regbuf->bkp_rdatas[1];
628+
rdt_datas_last=rdt_datas_last->next;
629+
630+
rdt_datas_last->data=page+ (hole_offset+hole_length);
631+
rdt_datas_last->len=BLCKSZ- (hole_offset+hole_length);
632+
}
589633
}
634+
635+
total_len+=bimg.length;
590636
}
591637

592638
if (needs_data)
@@ -619,6 +665,12 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
619665
{
620666
memcpy(scratch,&bimg,SizeOfXLogRecordBlockImageHeader);
621667
scratch+=SizeOfXLogRecordBlockImageHeader;
668+
if (hole_length!=0&&is_compressed)
669+
{
670+
memcpy(scratch,&cbimg,
671+
SizeOfXLogRecordBlockCompressHeader);
672+
scratch+=SizeOfXLogRecordBlockCompressHeader;
673+
}
622674
}
623675
if (!samerel)
624676
{
@@ -680,6 +732,57 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
680732
return&hdr_rdt;
681733
}
682734

735+
/*
736+
* Create a compressed version of a backup block image.
737+
*
738+
* Returns FALSE if compression fails (i.e., compressed result is actually
739+
* bigger than original). Otherwise, returns TRUE and sets 'dlen' to
740+
* the length of compressed block image.
741+
*/
742+
staticbool
743+
XLogCompressBackupBlock(char*page,uint16hole_offset,uint16hole_length,
744+
char*dest,uint16*dlen)
745+
{
746+
int32orig_len=BLCKSZ-hole_length;
747+
int32len;
748+
int32extra_bytes=0;
749+
char*source;
750+
chartmp[BLCKSZ];
751+
752+
if (hole_length!=0)
753+
{
754+
/* must skip the hole */
755+
source=tmp;
756+
memcpy(source,page,hole_offset);
757+
memcpy(source+hole_offset,
758+
page+ (hole_offset+hole_length),
759+
BLCKSZ- (hole_length+hole_offset));
760+
761+
/*
762+
* Extra data needs to be stored in WAL record for the compressed
763+
* version of block image if the hole exists.
764+
*/
765+
extra_bytes=SizeOfXLogRecordBlockCompressHeader;
766+
}
767+
else
768+
source=page;
769+
770+
/*
771+
* We recheck the actual size even if pglz_compress() reports success
772+
* and see if the number of bytes saved by compression is larger than
773+
* the length of extra data needed for the compressed version of block
774+
* image.
775+
*/
776+
len=pglz_compress(source,orig_len,dest,PGLZ_strategy_default);
777+
if (len >=0&&
778+
len+extra_bytes<orig_len)
779+
{
780+
*dlen= (uint16)len;/* successful compression */
781+
return true;
782+
}
783+
return false;
784+
}
785+
683786
/*
684787
* Determine whether the buffer referenced has to be backed up.
685788
*

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp