Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit7975c5e

Browse files
committed
Allow the WAL writer to flush WAL at a reduced rate.
Commit4de82f7 increased the WAL flush rate, mainly to increase thelikelihood that hint bits can be set quickly. More quickly set hint bitscan reduce contention around the clog et al. But unfortunately theincreased flush rate can have a significant negative performance impact,I have measured up to a factor of ~4. The reason for this slowdown isthat if there are independent writes to the underlying devices, forexample because shared buffers is a lot smaller than the hot data set,or because a checkpoint is ongoing, the fdatasync() calls force cacheflushes to be emitted to the storage.This is achieved by flushing WAL only if the last flush was longer thanwal_writer_delay ago, or if more than wal_writer_flush_after (new GUC)unflushed blocks are pending. Based on some tests the default forwal_writer_delay is 1MB, which seems to work well both on SSD androtational media.To avoid negative performance impact due to4de82f7 an earliercommit (db76b1e) made SetHintBits() more likely to succeed; preventingperformance regressions in the pgbench tests I performed.Discussion: 20160118163908.GW10941@awork2.anarazel.de
1 parent5df44d1 commit7975c5e

File tree

7 files changed

+141
-52
lines changed

7 files changed

+141
-52
lines changed

‎doc/src/sgml/config.sgml

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2344,15 +2344,38 @@ include_dir 'conf.d'
23442344
</indexterm>
23452345
</term>
23462346
<listitem>
2347-
<para>
2348-
Specifies the delay between activity rounds for the WAL writer.
2349-
In each round the writer will flush WAL to disk. It then sleeps for
2350-
<varname>wal_writer_delay</> milliseconds, and repeats. The default
2351-
value is 200 milliseconds (<literal>200ms</>). Note that on many
2352-
systems, the effective resolution of sleep delays is 10 milliseconds;
2353-
setting <varname>wal_writer_delay</> to a value that is not a multiple
2354-
of 10 might have the same results as setting it to the next higher
2355-
multiple of 10. This parameter can only be set in the
2347+
<para>
2348+
Specifies how often the WAL writer flushes WAL. After flushing WAL it
2349+
sleeps for <varname>wal_writer_delay</> milliseconds, unless woken up
2350+
by an asynchronously committing transaction. In case the last flush
2351+
happened less than <varname>wal_writer_delay</> milliseconds ago and
2352+
less than <varname>wal_writer_flush_after</> bytes of WAL have been
2353+
produced since, WAL is only written to the OS, not flushed to disk.
2354+
The default value is 200 milliseconds (<literal>200ms</>). Note that
2355+
on many systems, the effective resolution of sleep delays is 10
2356+
milliseconds; setting <varname>wal_writer_delay</> to a value that is
2357+
not a multiple of 10 might have the same results as setting it to the
2358+
next higher multiple of 10. This parameter can only be set in the
2359+
<filename>postgresql.conf</> file or on the server command line.
2360+
</para>
2361+
</listitem>
2362+
</varlistentry>
2363+
2364+
<varlistentry id="guc-wal-writer-flush-after" xreflabel="wal_writer_flush_after">
2365+
<term><varname>wal_writer_flush_after</varname> (<type>integer</type>)
2366+
<indexterm>
2367+
<primary><varname>wal_writer_flush_after</> configuration parameter</primary>
2368+
</indexterm>
2369+
</term>
2370+
<listitem>
2371+
<para>
2372+
Specifies how often the WAL writer flushes WAL. In case the last flush
2373+
happened less than <varname>wal_writer_delay</> milliseconds ago and
2374+
less than <varname>wal_writer_flush_after</> bytes of WAL have been
2375+
produced since, WAL is only written to the OS, not flushed to disk.
2376+
If <varname>wal_writer_flush_after</> is set to <literal>0</> WAL is
2377+
flushed everytime the WAL writer has written WAL. The default is
2378+
<literal>1MB</literal>. This parameter can only be set in the
23562379
<filename>postgresql.conf</> file or on the server command line.
23572380
</para>
23582381
</listitem>

‎src/backend/access/transam/README

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -736,20 +736,24 @@ non-roll-backable side effects (such as filesystem changes) force sync
736736
commit to minimize the window in which the filesystem change has been made
737737
but the transaction isn't guaranteed committed.
738738

739-
Every wal_writer_delay milliseconds, the walwriter process performs an
740-
XLogBackgroundFlush(). This checks the location of the last completely
741-
filled WAL page. If that has moved forwards, then we write all the changed
742-
buffers up to that point, so that under full load we write only whole
743-
buffers. If there has been a break in activity and the current WAL page is
744-
the same as before, then we find out the LSN of the most recent
745-
asynchronous commit, and flush up to that point, if required (i.e.,
746-
if it's in the current WAL page). This arrangement in itself would
747-
guarantee that an async commit record reaches disk during at worst the
748-
second walwriter cycle after the transaction completes. However, we also
749-
allow XLogFlush to flush full buffers "flexibly" (ie, not wrapping around
750-
at the end of the circular WAL buffer area), so as to minimize the number
751-
of writes issued under high load when multiple WAL pages are filled per
752-
walwriter cycle. This makes the worst-case delay three walwriter cycles.
739+
The walwriter regularly wakes up (via wal_writer_delay) or is woken up
740+
(via its latch, which is set by backends committing asynchronously) and
741+
performs an XLogBackgroundFlush(). This checks the location of the last
742+
completely filled WAL page. If that has moved forwards, then we write all
743+
the changed buffers up to that point, so that under full load we write
744+
only whole buffers. If there has been a break in activity and the current
745+
WAL page is the same as before, then we find out the LSN of the most
746+
recent asynchronous commit, and write up to that point, if required (i.e.
747+
if it's in the current WAL page). If more than wal_writer_delay has
748+
passed, or more than wal_writer_flush_after blocks have been written, since
749+
the last flush, WAL is also flushed up to the current location. This
750+
arrangement in itself would guarantee that an async commit record reaches
751+
disk after at most two times wal_writer_delay after the transaction
752+
completes. However, we also allow XLogFlush to write/flush full buffers
753+
"flexibly" (ie, not wrapping around at the end of the circular WAL buffer
754+
area), so as to minimize the number of writes issued under high load when
755+
multiple WAL pages are filled per walwriter cycle. This makes the worst-case
756+
delay three wal_writer_delay cycles.
753757

754758
There are some other subtle points to consider with asynchronous commits.
755759
First, for each page of CLOG we must remember the LSN of the latest commit

‎src/backend/access/transam/xlog.c

Lines changed: 76 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242
#include"miscadmin.h"
4343
#include"pgstat.h"
4444
#include"postmaster/bgwriter.h"
45+
#include"postmaster/walwriter.h"
4546
#include"postmaster/startup.h"
4647
#include"replication/basebackup.h"
4748
#include"replication/logical.h"
@@ -2729,28 +2730,37 @@ XLogFlush(XLogRecPtr record)
27292730
}
27302731

27312732
/*
2732-
*Flushxlog, but without specifying exactly where to flush to.
2733+
*Write & flushxlog, but without specifying exactly where to.
27332734
*
2734-
* We normally flush only completed blocks; but if there is nothing to do on
2735-
* that basis, we check for unflushed async commits in the current incomplete
2736-
* block, and flush through the latest one of those. Thus, if async commits
2737-
* are not being used, we will flush complete blocks only. We can guarantee
2738-
* that async commits reach disk after at most three cycles; normally only
2739-
* one or two. (When flushing complete blocks, we allow XLogWrite to write
2740-
* "flexibly", meaning it can stop at the end of the buffer ring; this makes a
2741-
* difference only with very high load or long wal_writer_delay, but imposes
2742-
* one extra cycle for the worst case for async commits.)
2735+
* We normally write only completed blocks; but if there is nothing to do on
2736+
* that basis, we check for unwritten async commits in the current incomplete
2737+
* block, and write through the latest one of those. Thus, if async commits
2738+
* are not being used, we will write complete blocks only.
2739+
*
2740+
* If, based on the above, there's anything to write we do so immediately. But
2741+
* to avoid calling fsync, fdatasync et. al. at a rate that'd impact
2742+
* concurrent IO, we only flush WAL every wal_writer_delay ms, or if there's
2743+
* more than wal_writer_flush_after unflushed blocks.
2744+
*
2745+
* We can guarantee that async commits reach disk after at most three
2746+
* wal_writer_delay cycles. (When flushing complete blocks, we allow XLogWrite
2747+
* to write "flexibly", meaning it can stop at the end of the buffer ring;
2748+
* this makes a difference only with very high load or long wal_writer_delay,
2749+
* but imposes one extra cycle for the worst case for async commits.)
27432750
*
27442751
* This routine is invoked periodically by the background walwriter process.
27452752
*
2746-
* Returns TRUE if we flushed anything.
2753+
* Returns TRUE if there was any work to do, even if we skipped flushing due
2754+
* to wal_writer_delay/wal_flush_after.
27472755
*/
27482756
bool
27492757
XLogBackgroundFlush(void)
27502758
{
2751-
XLogRecPtrWriteRqstPtr;
2759+
XLogwrtRqstWriteRqst;
27522760
boolflexible= true;
2753-
boolwrote_something= false;
2761+
staticTimestampTzlastflush;
2762+
TimestampTznow;
2763+
intflushbytes;
27542764

27552765
/* XLOG doesn't need flushing during recovery */
27562766
if (RecoveryInProgress())
@@ -2759,17 +2769,17 @@ XLogBackgroundFlush(void)
27592769
/* read LogwrtResult and update local state */
27602770
SpinLockAcquire(&XLogCtl->info_lck);
27612771
LogwrtResult=XLogCtl->LogwrtResult;
2762-
WriteRqstPtr=XLogCtl->LogwrtRqst.Write;
2772+
WriteRqst=XLogCtl->LogwrtRqst;
27632773
SpinLockRelease(&XLogCtl->info_lck);
27642774

27652775
/* back off to last completed page boundary */
2766-
WriteRqstPtr-=WriteRqstPtr %XLOG_BLCKSZ;
2776+
WriteRqst.Write-=WriteRqst.Write %XLOG_BLCKSZ;
27672777

27682778
/* if we have already flushed that far, consider async commit records */
2769-
if (WriteRqstPtr <=LogwrtResult.Flush)
2779+
if (WriteRqst.Write <=LogwrtResult.Flush)
27702780
{
27712781
SpinLockAcquire(&XLogCtl->info_lck);
2772-
WriteRqstPtr=XLogCtl->asyncXactLSN;
2782+
WriteRqst.Write=XLogCtl->asyncXactLSN;
27732783
SpinLockRelease(&XLogCtl->info_lck);
27742784
flexible= false;/* ensure it all gets written */
27752785
}
@@ -2779,7 +2789,7 @@ XLogBackgroundFlush(void)
27792789
* holding an open file handle to a logfile that's no longer in use,
27802790
* preventing the file from being deleted.
27812791
*/
2782-
if (WriteRqstPtr <=LogwrtResult.Flush)
2792+
if (WriteRqst.Write <=LogwrtResult.Flush)
27832793
{
27842794
if (openLogFile >=0)
27852795
{
@@ -2791,28 +2801,61 @@ XLogBackgroundFlush(void)
27912801
return false;
27922802
}
27932803

2804+
/*
2805+
* Determine how far to flush WAL, based on the wal_writer_delay and
2806+
* wal_writer_flush_after GUCs.
2807+
*/
2808+
now=GetCurrentTimestamp();
2809+
flushbytes=
2810+
WriteRqst.Write /XLOG_BLCKSZ-LogwrtResult.Flush /XLOG_BLCKSZ;
2811+
2812+
if (WalWriterFlushAfter==0||lastflush==0)
2813+
{
2814+
/* first call, or block based limits disabled */
2815+
WriteRqst.Flush=WriteRqst.Write;
2816+
lastflush=now;
2817+
}
2818+
elseif (TimestampDifferenceExceeds(lastflush,now,WalWriterDelay))
2819+
{
2820+
/*
2821+
* Flush the writes at least every WalWriteDelay ms. This is important
2822+
* to bound the amount of time it takes for an asynchronous commit to
2823+
* hit disk.
2824+
*/
2825+
WriteRqst.Flush=WriteRqst.Write;
2826+
lastflush=now;
2827+
}
2828+
elseif (flushbytes >=WalWriterFlushAfter)
2829+
{
2830+
/* exceeded wal_writer_flush_after blocks, flush */
2831+
WriteRqst.Flush=WriteRqst.Write;
2832+
lastflush=now;
2833+
}
2834+
else
2835+
{
2836+
/* no flushing, this time round */
2837+
WriteRqst.Flush=0;
2838+
}
2839+
27942840
#ifdefWAL_DEBUG
27952841
if (XLOG_DEBUG)
2796-
elog(LOG,"xlog bg flush request %X/%X; write %X/%X; flush %X/%X",
2797-
(uint32) (WriteRqstPtr >>32), (uint32)WriteRqstPtr,
2842+
elog(LOG,"xlog bg flush request write %X/%X; flush: %X/%X, current is write %X/%X; flush %X/%X",
2843+
(uint32) (WriteRqst.Write >>32), (uint32)WriteRqst.Write,
2844+
(uint32) (WriteRqst.Flush >>32), (uint32)WriteRqst.Flush,
27982845
(uint32) (LogwrtResult.Write >>32), (uint32)LogwrtResult.Write,
27992846
(uint32) (LogwrtResult.Flush >>32), (uint32)LogwrtResult.Flush);
28002847
#endif
28012848

28022849
START_CRIT_SECTION();
28032850

28042851
/* now wait for any in-progress insertions to finish and get write lock */
2805-
WaitXLogInsertionsToFinish(WriteRqstPtr);
2852+
WaitXLogInsertionsToFinish(WriteRqst.Write);
28062853
LWLockAcquire(WALWriteLock,LW_EXCLUSIVE);
28072854
LogwrtResult=XLogCtl->LogwrtResult;
2808-
if (WriteRqstPtr>LogwrtResult.Flush)
2855+
if (WriteRqst.Write>LogwrtResult.Write||
2856+
WriteRqst.Flush>LogwrtResult.Flush)
28092857
{
2810-
XLogwrtRqstWriteRqst;
2811-
2812-
WriteRqst.Write=WriteRqstPtr;
2813-
WriteRqst.Flush=WriteRqstPtr;
28142858
XLogWrite(WriteRqst,flexible);
2815-
wrote_something= true;
28162859
}
28172860
LWLockRelease(WALWriteLock);
28182861

@@ -2827,7 +2870,12 @@ XLogBackgroundFlush(void)
28272870
*/
28282871
AdvanceXLInsertBuffer(InvalidXLogRecPtr, true);
28292872

2830-
returnwrote_something;
2873+
/*
2874+
* If we determined that we need to write data, but somebody else
2875+
* wrote/flushed already, it should be considered as being active, to
2876+
* avoid hibernating too early.
2877+
*/
2878+
return true;
28312879
}
28322880

28332881
/*

‎src/backend/postmaster/walwriter.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@
6464
* GUC parameters
6565
*/
6666
intWalWriterDelay=200;
67+
intWalWriterFlushAfter=128;
6768

6869
/*
6970
* Number of do-nothing loops before lengthening the delay time, and the

‎src/backend/utils/misc/guc.c

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2235,7 +2235,7 @@ static struct config_int ConfigureNamesInt[] =
22352235

22362236
{
22372237
{"wal_writer_delay",PGC_SIGHUP,WAL_SETTINGS,
2238-
gettext_noop("WALwriter sleep time between WALflushes."),
2238+
gettext_noop("Time betweenWALflushes performed in the WALwriter."),
22392239
NULL,
22402240
GUC_UNIT_MS
22412241
},
@@ -2244,6 +2244,17 @@ static struct config_int ConfigureNamesInt[] =
22442244
NULL,NULL,NULL
22452245
},
22462246

2247+
{
2248+
{"wal_writer_flush_after",PGC_SIGHUP,WAL_SETTINGS,
2249+
gettext_noop("Amount of WAL written out by WAL writer triggering a flush."),
2250+
NULL,
2251+
GUC_UNIT_XBLOCKS
2252+
},
2253+
&WalWriterFlushAfter,
2254+
128,0,INT_MAX,
2255+
NULL,NULL,NULL
2256+
},
2257+
22472258
{
22482259
/* see max_connections */
22492260
{"max_wal_senders",PGC_POSTMASTER,REPLICATION_SENDING,

‎src/backend/utils/misc/postgresql.conf.sample

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,7 @@
192192
#wal_buffers = -1# min 32kB, -1 sets based on shared_buffers
193193
# (change requires restart)
194194
#wal_writer_delay = 200ms# 1-10000 milliseconds
195+
#wal_writer_flush_after = 1MB# 0 disables
195196

196197
#commit_delay = 0# range 0-100000, in microseconds
197198
#commit_siblings = 5# range 1-1000

‎src/include/postmaster/walwriter.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
/* GUC options */
1616
externintWalWriterDelay;
17+
externintWalWriterFlushAfter;
1718

1819
externvoidWalWriterMain(void)pg_attribute_noreturn();
1920

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp