Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit03b89f1

Browse files
committed
Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this. If agiven relfilenode received both a WAL-skipping COPY and a WAL-loggedoperation (e.g. INSERT), recovery could lose tuples from the COPY. Seesrc/backend/access/transam/README section "Skipping WAL for NewRelFileNode" for the new coding rules. Maintainers of table accessmethods should examine that section.To maintain data durability, just before commit, we choose between anfsync of the relfilenode and copying its contents to WAL. A new GUC,wal_skip_threshold, guides that choice. If this change slows a workloadthat creates small, permanent relfilenodes under wal_level=minimal, tryadjusting wal_skip_threshold. Users setting a timeout on COMMIT mayneed to adjust that timeout, and log_min_duration_statement analysiswill reflect time consumption moving to COMMIT from commands like COPY.Internally, this requires a reliable determination of whetherRollbackAndReleaseCurrentSubTransaction() would unlink a relation'scurrent relfilenode. Introduce rd_firstRelfilenodeSubid. Amend thespecification of rd_createSubid such that the field is zero when a newrel has an old rd_node. Make relcache.c retain entries for certaindropped relations until end of transaction.Back-patch to 9.5 (all supported versions). This introduces a new WALrecord type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. Asalways, update standby systems before master systems. This changessizeof(RelationData) and sizeof(IndexStmt), breaking binarycompatibility for affected extensions. (The most recent commit toaffect the same class of extensions was089e4d4.)Kyotaro Horiguchi, reviewed (in earlier, similar versions) by RobertHaas. Heikki Linnakangas and Michael Paquier implemented earlierdesigns that materially clarified the problem. Reviewed, in earlierdesigns, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.Discussion:https://postgr.es/m/20150702220524.GA9392@svana.org
1 parentae86e46 commit03b89f1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+1404
-337
lines changed

‎doc/src/sgml/config.sgml‎

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2323,16 +2323,19 @@ include_dir 'conf.d'
23232323
levels. This parameter can only be set at server start.
23242324
</para>
23252325
<para>
2326-
In <literal>minimal</literal> level, WAL-logging of some bulk
2327-
operations can be safely skipped, which can make those
2328-
operations much faster (see <xref linkend="populate-pitr"/>).
2329-
Operations in which this optimization can be applied include:
2326+
In <literal>minimal</literal> level, no information is logged for
2327+
permanent relations for the remainder of a transaction that creates or
2328+
rewrites them. This can make operations much faster (see
2329+
<xref linkend="populate-pitr"/>). Operations that initiate this
2330+
optimization include:
23302331
<simplelist>
2331-
<member><command>CREATE TABLE AS</command></member>
2332-
<member><command>CREATE INDEX</command></member>
2332+
<member><command>ALTER ... SET TABLESPACE</command></member>
23332333
<member><command>CLUSTER</command></member>
2334-
<member><command>COPY</command> into tables that were created or truncated in the same
2335-
transaction</member>
2334+
<member><command>CREATE TABLE</command></member>
2335+
<member><command>REFRESH MATERIALIZED VIEW</command>
2336+
(without <option>CONCURRENTLY</option>)</member>
2337+
<member><command>REINDEX</command></member>
2338+
<member><command>TRUNCATE</command></member>
23362339
</simplelist>
23372340
But minimal WAL does not contain enough information to reconstruct the
23382341
data from a base backup and the WAL logs, so <literal>replica</literal> or
@@ -2721,6 +2724,26 @@ include_dir 'conf.d'
27212724
</listitem>
27222725
</varlistentry>
27232726

2727+
<varlistentry id="guc-wal-skip-threshold" xreflabel="wal_skip_threshold">
2728+
<term><varname>wal_skip_threshold</varname> (<type>integer</type>)
2729+
<indexterm>
2730+
<primary><varname>wal_skip_threshold</varname> configuration parameter</primary>
2731+
</indexterm>
2732+
</term>
2733+
<listitem>
2734+
<para>
2735+
When <varname>wal_level</varname> is <literal>minimal</literal> and a
2736+
transaction commits after creating or rewriting a permanent relation,
2737+
this setting determines how to persist the new data. If the data is
2738+
smaller than this setting, write it to the WAL log; otherwise, use an
2739+
fsync of affected files. Depending on the properties of your storage,
2740+
raising or lowering this value might help if such commits are slowing
2741+
concurrent transactions. The default is two megabytes
2742+
(<literal>2MB</literal>).
2743+
</para>
2744+
</listitem>
2745+
</varlistentry>
2746+
27242747
<varlistentry id="guc-commit-delay" xreflabel="commit_delay">
27252748
<term><varname>commit_delay</varname> (<type>integer</type>)
27262749
<indexterm>

‎doc/src/sgml/perform.sgml‎

Lines changed: 9 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1536,8 +1536,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
15361536
needs to be written, because in case of an error, the files
15371537
containing the newly loaded data will be removed anyway.
15381538
However, this consideration only applies when
1539-
<xref linkend="guc-wal-level"/> is <literal>minimal</literal> for
1540-
non-partitioned tablesas all commands must write WAL otherwise.
1539+
<xref linkend="guc-wal-level"/> is <literal>minimal</literal>
1540+
as all commands must write WAL otherwise.
15411541
</para>
15421542

15431543
</sect2>
@@ -1637,42 +1637,13 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
16371637
</para>
16381638

16391639
<para>
1640-
Aside from avoiding the time for the archiver or WAL sender to
1641-
process the WAL data,
1642-
doing this will actually make certain commands faster, because they
1643-
are designed not to write WAL at all if <varname>wal_level</varname>
1644-
is <literal>minimal</literal>. (They can guarantee crash safety more cheaply
1645-
by doing an <function>fsync</function> at the end than by writing WAL.)
1646-
This applies to the following commands:
1647-
<itemizedlist>
1648-
<listitem>
1649-
<para>
1650-
<command>CREATE TABLE AS SELECT</command>
1651-
</para>
1652-
</listitem>
1653-
<listitem>
1654-
<para>
1655-
<command>CREATE INDEX</command> (and variants such as
1656-
<command>ALTER TABLE ADD PRIMARY KEY</command>)
1657-
</para>
1658-
</listitem>
1659-
<listitem>
1660-
<para>
1661-
<command>ALTER TABLE SET TABLESPACE</command>
1662-
</para>
1663-
</listitem>
1664-
<listitem>
1665-
<para>
1666-
<command>CLUSTER</command>
1667-
</para>
1668-
</listitem>
1669-
<listitem>
1670-
<para>
1671-
<command>COPY FROM</command>, when the target table has been
1672-
created or truncated earlier in the same transaction
1673-
</para>
1674-
</listitem>
1675-
</itemizedlist>
1640+
Aside from avoiding the time for the archiver or WAL sender to process the
1641+
WAL data, doing this will actually make certain commands faster, because
1642+
they do not to write WAL at all if <varname>wal_level</varname>
1643+
is <literal>minimal</literal> and the current subtransaction (or top-level
1644+
transaction) created or truncated the table or index they change. (They
1645+
can guarantee crash safety more cheaply by doing
1646+
an <function>fsync</function> at the end than by writing WAL.)
16761647
</para>
16771648
</sect2>
16781649

‎src/backend/access/gist/gistbuild.c‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
191191
PageSetLSN(page,recptr);
192192
}
193193
else
194-
PageSetLSN(page,gistGetFakeLSN(heap));
194+
PageSetLSN(page,gistGetFakeLSN(index));
195195

196196
UnlockReleaseBuffer(buffer);
197197

‎src/backend/access/gist/gistutil.c‎

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -972,23 +972,44 @@ gistproperty(Oid index_oid, int attno,
972972
}
973973

974974
/*
975-
*Temporary and unlogged GiSTindexes are not WAL-logged, but we need LSNs
976-
*to detect concurrent pagesplits anyway. This function provides a fake
977-
*sequence of LSNs for thatpurpose.
975+
*Someindexes are not WAL-logged, but we need LSNs to detect concurrent page
976+
* splits anyway. This function provides a fake sequence of LSNs for that
977+
* purpose.
978978
*/
979979
XLogRecPtr
980980
gistGetFakeLSN(Relationrel)
981981
{
982-
staticXLogRecPtrcounter=1;
983-
984982
if (rel->rd_rel->relpersistence==RELPERSISTENCE_TEMP)
985983
{
986984
/*
987985
* Temporary relations are only accessible in our session, so a simple
988986
* backend-local counter will do.
989987
*/
988+
staticXLogRecPtrcounter=1;
989+
990990
returncounter++;
991991
}
992+
elseif (rel->rd_rel->relpersistence==RELPERSISTENCE_PERMANENT)
993+
{
994+
/*
995+
* WAL-logging on this relation will start after commit, so its LSNs
996+
* must be distinct numbers smaller than the LSN at the next commit.
997+
* Emit a dummy WAL record if insert-LSN hasn't advanced after the
998+
* last call.
999+
*/
1000+
staticXLogRecPtrlastlsn=InvalidXLogRecPtr;
1001+
XLogRecPtrcurrlsn=GetXLogInsertRecPtr();
1002+
1003+
/* Shouldn't be called for WAL-logging relations */
1004+
Assert(!RelationNeedsWAL(rel));
1005+
1006+
/* No need for an actual record if we already have a distinct LSN */
1007+
if (!XLogRecPtrIsInvalid(lastlsn)&&lastlsn==currlsn)
1008+
currlsn=gistXLogAssignLSN();
1009+
1010+
lastlsn=currlsn;
1011+
returncurrlsn;
1012+
}
9921013
else
9931014
{
9941015
/*

‎src/backend/access/gist/gistxlog.c‎

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -505,6 +505,9 @@ gist_redo(XLogReaderState *record)
505505
caseXLOG_GIST_CREATE_INDEX:
506506
gistRedoCreateIndex(record);
507507
break;
508+
caseXLOG_GIST_ASSIGN_LSN:
509+
/* nop. See gistGetFakeLSN(). */
510+
break;
508511
default:
509512
elog(PANIC,"gist_redo: unknown op code %u",info);
510513
}
@@ -623,6 +626,24 @@ gistXLogSplit(bool page_is_leaf,
623626
returnrecptr;
624627
}
625628

629+
/*
630+
* Write an empty XLOG record to assign a distinct LSN.
631+
*/
632+
XLogRecPtr
633+
gistXLogAssignLSN(void)
634+
{
635+
intdummy=0;
636+
637+
/*
638+
* Records other than SWITCH_WAL must have content. We use an integer 0 to
639+
* follow the restriction.
640+
*/
641+
XLogBeginInsert();
642+
XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
643+
XLogRegisterData((char*)&dummy,sizeof(dummy));
644+
returnXLogInsert(RM_GIST_ID,XLOG_GIST_ASSIGN_LSN);
645+
}
646+
626647
/*
627648
* Write XLOG record describing a page update. The update can include any
628649
* number of deletions and/or insertions of tuples on a single index page.

‎src/backend/access/heap/heapam.c‎

Lines changed: 9 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
*heap_multi_insert - insert multiple tuples into a relation
2828
*heap_delete- delete a tuple from a relation
2929
*heap_update- replace a tuple in a relation with another tuple
30-
*heap_sync- sync heap, for when no WAL has been written
3130
*
3231
* NOTES
3332
* This file contains the heap_ routines which implement
@@ -2396,12 +2395,6 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
23962395
* The new tuple is stamped with current transaction ID and the specified
23972396
* command ID.
23982397
*
2399-
* If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
2400-
* logged in WAL, even for a non-temp relation. Safe usage of this behavior
2401-
* requires that we arrange that all new tuples go into new pages not
2402-
* containing any tuples from other transactions, and that the relation gets
2403-
* fsync'd before commit. (See also heap_sync() comments)
2404-
*
24052398
* The HEAP_INSERT_SKIP_FSM option is passed directly to
24062399
* RelationGetBufferForTuple, which see for more info.
24072400
*
@@ -2510,7 +2503,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
25102503
MarkBufferDirty(buffer);
25112504

25122505
/* XLOG stuff */
2513-
if (!(options&HEAP_INSERT_SKIP_WAL)&&RelationNeedsWAL(relation))
2506+
if (RelationNeedsWAL(relation))
25142507
{
25152508
xl_heap_insertxlrec;
25162509
xl_heap_headerxlhdr;
@@ -2720,7 +2713,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
27202713
/* currently not needed (thus unsupported) for heap_multi_insert() */
27212714
AssertArg(!(options&HEAP_INSERT_NO_LOGICAL));
27222715

2723-
needwal=!(options&HEAP_INSERT_SKIP_WAL)&&RelationNeedsWAL(relation);
2716+
needwal=RelationNeedsWAL(relation);
27242717
saveFreeSpace=RelationGetTargetPageFreeSpace(relation,
27252718
HEAP_DEFAULT_FILLFACTOR);
27262719

@@ -9420,18 +9413,13 @@ heap2_redo(XLogReaderState *record)
94209413
}
94219414

94229415
/*
9423-
*heap_sync- sync a heap, for use when no WAL has been written
9424-
*
9425-
* This forces the heap contents (including TOAST heap if any) down to disk.
9426-
* If we skipped using WAL, and WAL is otherwise needed, we must force the
9427-
* relation down to disk before it's safe to commit the transaction. This
9428-
* requires writing out any dirty buffers and then doing a forced fsync.
9429-
*
9430-
* Indexes are not touched. (Currently, index operations associated with
9431-
* the commands that use this are WAL-logged and so do not need fsync.
9432-
* That behavior might change someday, but in any case it's likely that
9433-
* any fsync decisions required would be per-index and hence not appropriate
9434-
* to be done here.)
9416+
*heap_sync- for binary compatibility
9417+
*
9418+
* A newer PostgreSQL version removes this function. It exists here just in
9419+
* case an extension calls it. See "Skipping WAL for New RelFileNode" in
9420+
* src/backend/access/transam/README for the system that superseded it,
9421+
* allowing removal of most calls. Cases like copy_relation_data() should
9422+
* call smgrimmedsync() directly.
94359423
*/
94369424
void
94379425
heap_sync(Relationrel)

‎src/backend/access/heap/rewriteheap.c‎

Lines changed: 5 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,6 @@ typedef struct RewriteStateData
145145
Pagers_buffer;/* page currently being built */
146146
BlockNumberrs_blockno;/* block where page will go */
147147
boolrs_buffer_valid;/* T if any tuples in buffer */
148-
boolrs_use_wal;/* must we WAL-log inserts? */
149148
boolrs_logical_rewrite;/* do we need to do logical rewriting */
150149
TransactionIdrs_oldest_xmin;/* oldest xmin used by caller to determine
151150
* tuple visibility */
@@ -239,15 +238,13 @@ static void logical_end_heap_rewrite(RewriteState state);
239238
* oldest_xminxid used by the caller to determine which tuples are dead
240239
* freeze_xidxid before which tuples will be frozen
241240
* min_multimultixact before which multis will be removed
242-
* use_walshould the inserts to the new heap be WAL-logged?
243241
*
244242
* Returns an opaque RewriteState, allocated in current memory context,
245243
* to be used in subsequent calls to the other functions.
246244
*/
247245
RewriteState
248246
begin_heap_rewrite(Relationold_heap,Relationnew_heap,TransactionIdoldest_xmin,
249-
TransactionIdfreeze_xid,MultiXactIdcutoff_multi,
250-
booluse_wal)
247+
TransactionIdfreeze_xid,MultiXactIdcutoff_multi)
251248
{
252249
RewriteStatestate;
253250
MemoryContextrw_cxt;
@@ -272,7 +269,6 @@ begin_heap_rewrite(Relation old_heap, Relation new_heap, TransactionId oldest_xm
272269
/* new_heap needn't be empty, just locked */
273270
state->rs_blockno=RelationGetNumberOfBlocks(new_heap);
274271
state->rs_buffer_valid= false;
275-
state->rs_use_wal=use_wal;
276272
state->rs_oldest_xmin=oldest_xmin;
277273
state->rs_freeze_xid=freeze_xid;
278274
state->rs_cutoff_multi=cutoff_multi;
@@ -331,7 +327,7 @@ end_heap_rewrite(RewriteState state)
331327
/* Write the last page, if any */
332328
if (state->rs_buffer_valid)
333329
{
334-
if (state->rs_use_wal)
330+
if (RelationNeedsWAL(state->rs_new_rel))
335331
log_newpage(&state->rs_new_rel->rd_node,
336332
MAIN_FORKNUM,
337333
state->rs_blockno,
@@ -346,18 +342,14 @@ end_heap_rewrite(RewriteState state)
346342
}
347343

348344
/*
349-
* If the rel is WAL-logged, must fsync before commit. We use heap_sync
350-
* to ensure that the toast table gets fsync'd too.
351-
*
352-
* It's obvious that we must do this when not WAL-logging. It's less
353-
* obvious that we have to do it even if we did WAL-log the pages. The
345+
* When we WAL-logged rel pages, we must nonetheless fsync them. The
354346
* reason is the same as in tablecmds.c's copy_relation_data(): we're
355347
* writing data that's not in shared buffers, and so a CHECKPOINT
356348
* occurring during the rewriteheap operation won't have fsync'd data we
357349
* wrote before the checkpoint.
358350
*/
359351
if (RelationNeedsWAL(state->rs_new_rel))
360-
heap_sync(state->rs_new_rel);
352+
smgrimmedsync(state->rs_new_rel->rd_smgr,MAIN_FORKNUM);
361353

362354
logical_end_heap_rewrite(state);
363355

@@ -655,9 +647,6 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
655647
{
656648
intoptions=HEAP_INSERT_SKIP_FSM;
657649

658-
if (!state->rs_use_wal)
659-
options |=HEAP_INSERT_SKIP_WAL;
660-
661650
/*
662651
* While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
663652
* for the TOAST table are not logically decoded. The main heap is
@@ -696,7 +685,7 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
696685
/* Doesn't fit, so write out the existing page */
697686

698687
/* XLOG stuff */
699-
if (state->rs_use_wal)
688+
if (RelationNeedsWAL(state->rs_new_rel))
700689
log_newpage(&state->rs_new_rel->rd_node,
701690
MAIN_FORKNUM,
702691
state->rs_blockno,

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp