Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit9d62152

Browse files
committed
Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this. If agiven relfilenode received both a WAL-skipping COPY and a WAL-loggedoperation (e.g. INSERT), recovery could lose tuples from the COPY. Seesrc/backend/access/transam/README section "Skipping WAL for NewRelFileNode" for the new coding rules. Maintainers of table accessmethods should examine that section.To maintain data durability, just before commit, we choose between anfsync of the relfilenode and copying its contents to WAL. A new GUC,wal_skip_threshold, guides that choice. If this change slows a workloadthat creates small, permanent relfilenodes under wal_level=minimal, tryadjusting wal_skip_threshold. Users setting a timeout on COMMIT mayneed to adjust that timeout, and log_min_duration_statement analysiswill reflect time consumption moving to COMMIT from commands like COPY.Internally, this requires a reliable determination of whetherRollbackAndReleaseCurrentSubTransaction() would unlink a relation'scurrent relfilenode. Introduce rd_firstRelfilenodeSubid. Amend thespecification of rd_createSubid such that the field is zero when a newrel has an old rd_node. Make relcache.c retain entries for certaindropped relations until end of transaction.Back-patch to 9.5 (all supported versions). This introduces a new WALrecord type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. Asalways, update standby systems before master systems. This changessizeof(RelationData) and sizeof(IndexStmt), breaking binarycompatibility for affected extensions. (The most recent commit toaffect the same class of extensions was089e4d4.)Kyotaro Horiguchi, reviewed (in earlier, similar versions) by RobertHaas. Heikki Linnakangas and Michael Paquier implemented earlierdesigns that materially clarified the problem. Reviewed, in earlierdesigns, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.Discussion:https://postgr.es/m/20150702220524.GA9392@svana.org
1 parent43434ed commit9d62152

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1401
-337
lines changed

‎doc/src/sgml/config.sgml

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2234,16 +2234,19 @@ include_dir 'conf.d'
22342234
levels. This parameter can only be set at server start.
22352235
</para>
22362236
<para>
2237-
In <literal>minimal</> level, WAL-logging of some bulk
2238-
operations can be safely skipped, which can make those
2239-
operations much faster (see <xref linkend="populate-pitr">).
2240-
Operations in which this optimization can be applied include:
2237+
In <literal>minimal</literal> level, no information is logged for
2238+
permanent relations for the remainder of a transaction that creates or
2239+
rewrites them. This can make operations much faster (see
2240+
<xref linkend="populate-pitr">). Operations that initiate this
2241+
optimization include:
22412242
<simplelist>
2242-
<member><command>CREATE TABLE AS</></member>
2243-
<member><command>CREATE INDEX</></member>
2244-
<member><command>CLUSTER</></member>
2245-
<member><command>COPY</> into tables that were created or truncated in the same
2246-
transaction</member>
2243+
<member><command>ALTER ... SET TABLESPACE</command></member>
2244+
<member><command>CLUSTER</command></member>
2245+
<member><command>CREATE TABLE</command></member>
2246+
<member><command>REFRESH MATERIALIZED VIEW</command>
2247+
(without <option>CONCURRENTLY</option>)</member>
2248+
<member><command>REINDEX</command></member>
2249+
<member><command>TRUNCATE</command></member>
22472250
</simplelist>
22482251
But minimal WAL does not contain enough information to reconstruct the
22492252
data from a base backup and the WAL logs, so <literal>replica</> or
@@ -2632,6 +2635,26 @@ include_dir 'conf.d'
26322635
</listitem>
26332636
</varlistentry>
26342637

2638+
<varlistentry id="guc-wal-skip-threshold" xreflabel="wal_skip_threshold">
2639+
<term><varname>wal_skip_threshold</varname> (<type>integer</type>)
2640+
<indexterm>
2641+
<primary><varname>wal_skip_threshold</varname> configuration parameter</primary>
2642+
</indexterm>
2643+
</term>
2644+
<listitem>
2645+
<para>
2646+
When <varname>wal_level</varname> is <literal>minimal</literal> and a
2647+
transaction commits after creating or rewriting a permanent relation,
2648+
this setting determines how to persist the new data. If the data is
2649+
smaller than this setting, write it to the WAL log; otherwise, use an
2650+
fsync of affected files. Depending on the properties of your storage,
2651+
raising or lowering this value might help if such commits are slowing
2652+
concurrent transactions. The default is two megabytes
2653+
(<literal>2MB</literal>).
2654+
</para>
2655+
</listitem>
2656+
</varlistentry>
2657+
26352658
<varlistentry id="guc-commit-delay" xreflabel="commit_delay">
26362659
<term><varname>commit_delay</varname> (<type>integer</type>)
26372660
<indexterm>

‎doc/src/sgml/perform.sgml

Lines changed: 9 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1523,8 +1523,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
15231523
needs to be written, because in case of an error, the files
15241524
containing the newly loaded data will be removed anyway.
15251525
However, this consideration only applies when
1526-
<xref linkend="guc-wal-level"> is <literal>minimal</> for
1527-
non-partitioned tablesas all commands must write WAL otherwise.
1526+
<xref linkend="guc-wal-level"> is <literal>minimal</literal>
1527+
as all commands must write WAL otherwise.
15281528
</para>
15291529

15301530
</sect2>
@@ -1624,42 +1624,13 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
16241624
</para>
16251625

16261626
<para>
1627-
Aside from avoiding the time for the archiver or WAL sender to
1628-
process the WAL data,
1629-
doing this will actually make certain commands faster, because they
1630-
are designed not to write WAL at all if <varname>wal_level</varname>
1631-
is <literal>minimal</>. (They can guarantee crash safety more cheaply
1632-
by doing an <function>fsync</> at the end than by writing WAL.)
1633-
This applies to the following commands:
1634-
<itemizedlist>
1635-
<listitem>
1636-
<para>
1637-
<command>CREATE TABLE AS SELECT</command>
1638-
</para>
1639-
</listitem>
1640-
<listitem>
1641-
<para>
1642-
<command>CREATE INDEX</command> (and variants such as
1643-
<command>ALTER TABLE ADD PRIMARY KEY</command>)
1644-
</para>
1645-
</listitem>
1646-
<listitem>
1647-
<para>
1648-
<command>ALTER TABLE SET TABLESPACE</command>
1649-
</para>
1650-
</listitem>
1651-
<listitem>
1652-
<para>
1653-
<command>CLUSTER</command>
1654-
</para>
1655-
</listitem>
1656-
<listitem>
1657-
<para>
1658-
<command>COPY FROM</command>, when the target table has been
1659-
created or truncated earlier in the same transaction
1660-
</para>
1661-
</listitem>
1662-
</itemizedlist>
1627+
Aside from avoiding the time for the archiver or WAL sender to process the
1628+
WAL data, doing this will actually make certain commands faster, because
1629+
they do not to write WAL at all if <varname>wal_level</varname>
1630+
is <literal>minimal</literal> and the current subtransaction (or top-level
1631+
transaction) created or truncated the table or index they change. (They
1632+
can guarantee crash safety more cheaply by doing
1633+
an <function>fsync</function> at the end than by writing WAL.)
16631634
</para>
16641635
</sect2>
16651636

‎src/backend/access/gist/gistbuild.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
191191
PageSetLSN(page,recptr);
192192
}
193193
else
194-
PageSetLSN(page,gistGetFakeLSN(heap));
194+
PageSetLSN(page,gistGetFakeLSN(index));
195195

196196
UnlockReleaseBuffer(buffer);
197197

‎src/backend/access/gist/gistutil.c

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -938,23 +938,44 @@ gistproperty(Oid index_oid, int attno,
938938
}
939939

940940
/*
941-
*Temporary and unlogged GiSTindexes are not WAL-logged, but we need LSNs
942-
*to detect concurrent pagesplits anyway. This function provides a fake
943-
*sequence of LSNs for thatpurpose.
941+
*Someindexes are not WAL-logged, but we need LSNs to detect concurrent page
942+
* splits anyway. This function provides a fake sequence of LSNs for that
943+
* purpose.
944944
*/
945945
XLogRecPtr
946946
gistGetFakeLSN(Relationrel)
947947
{
948-
staticXLogRecPtrcounter=1;
949-
950948
if (rel->rd_rel->relpersistence==RELPERSISTENCE_TEMP)
951949
{
952950
/*
953951
* Temporary relations are only accessible in our session, so a simple
954952
* backend-local counter will do.
955953
*/
954+
staticXLogRecPtrcounter=1;
955+
956956
returncounter++;
957957
}
958+
elseif (rel->rd_rel->relpersistence==RELPERSISTENCE_PERMANENT)
959+
{
960+
/*
961+
* WAL-logging on this relation will start after commit, so its LSNs
962+
* must be distinct numbers smaller than the LSN at the next commit.
963+
* Emit a dummy WAL record if insert-LSN hasn't advanced after the
964+
* last call.
965+
*/
966+
staticXLogRecPtrlastlsn=InvalidXLogRecPtr;
967+
XLogRecPtrcurrlsn=GetXLogInsertRecPtr();
968+
969+
/* Shouldn't be called for WAL-logging relations */
970+
Assert(!RelationNeedsWAL(rel));
971+
972+
/* No need for an actual record if we already have a distinct LSN */
973+
if (!XLogRecPtrIsInvalid(lastlsn)&&lastlsn==currlsn)
974+
currlsn=gistXLogAssignLSN();
975+
976+
lastlsn=currlsn;
977+
returncurrlsn;
978+
}
958979
else
959980
{
960981
/*

‎src/backend/access/gist/gistxlog.c

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -505,6 +505,9 @@ gist_redo(XLogReaderState *record)
505505
caseXLOG_GIST_CREATE_INDEX:
506506
gistRedoCreateIndex(record);
507507
break;
508+
caseXLOG_GIST_ASSIGN_LSN:
509+
/* nop. See gistGetFakeLSN(). */
510+
break;
508511
default:
509512
elog(PANIC,"gist_redo: unknown op code %u",info);
510513
}
@@ -623,6 +626,24 @@ gistXLogSplit(bool page_is_leaf,
623626
returnrecptr;
624627
}
625628

629+
/*
630+
* Write an empty XLOG record to assign a distinct LSN.
631+
*/
632+
XLogRecPtr
633+
gistXLogAssignLSN(void)
634+
{
635+
intdummy=0;
636+
637+
/*
638+
* Records other than SWITCH_WAL must have content. We use an integer 0 to
639+
* follow the restriction.
640+
*/
641+
XLogBeginInsert();
642+
XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
643+
XLogRegisterData((char*)&dummy,sizeof(dummy));
644+
returnXLogInsert(RM_GIST_ID,XLOG_GIST_ASSIGN_LSN);
645+
}
646+
626647
/*
627648
* Write XLOG record describing a page update. The update can include any
628649
* number of deletions and/or insertions of tuples on a single index page.

‎src/backend/access/heap/heapam.c

Lines changed: 9 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
*heap_multi_insert - insert multiple tuples into a relation
2828
*heap_delete- delete a tuple from a relation
2929
*heap_update- replace a tuple in a relation with another tuple
30-
*heap_sync- sync heap, for when no WAL has been written
3130
*
3231
* NOTES
3332
* This file contains the heap_ routines which implement
@@ -2351,12 +2350,6 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
23512350
* The new tuple is stamped with current transaction ID and the specified
23522351
* command ID.
23532352
*
2354-
* If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
2355-
* logged in WAL, even for a non-temp relation. Safe usage of this behavior
2356-
* requires that we arrange that all new tuples go into new pages not
2357-
* containing any tuples from other transactions, and that the relation gets
2358-
* fsync'd before commit. (See also heap_sync() comments)
2359-
*
23602353
* The HEAP_INSERT_SKIP_FSM option is passed directly to
23612354
* RelationGetBufferForTuple, which see for more info.
23622355
*
@@ -2465,7 +2458,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
24652458
MarkBufferDirty(buffer);
24662459

24672460
/* XLOG stuff */
2468-
if (!(options&HEAP_INSERT_SKIP_WAL)&&RelationNeedsWAL(relation))
2461+
if (RelationNeedsWAL(relation))
24692462
{
24702463
xl_heap_insertxlrec;
24712464
xl_heap_headerxlhdr;
@@ -2673,7 +2666,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
26732666
/* currently not needed (thus unsupported) for heap_multi_insert() */
26742667
AssertArg(!(options&HEAP_INSERT_NO_LOGICAL));
26752668

2676-
needwal=!(options&HEAP_INSERT_SKIP_WAL)&&RelationNeedsWAL(relation);
2669+
needwal=RelationNeedsWAL(relation);
26772670
saveFreeSpace=RelationGetTargetPageFreeSpace(relation,
26782671
HEAP_DEFAULT_FILLFACTOR);
26792672

@@ -9260,18 +9253,13 @@ heap2_redo(XLogReaderState *record)
92609253
}
92619254

92629255
/*
9263-
*heap_sync- sync a heap, for use when no WAL has been written
9264-
*
9265-
* This forces the heap contents (including TOAST heap if any) down to disk.
9266-
* If we skipped using WAL, and WAL is otherwise needed, we must force the
9267-
* relation down to disk before it's safe to commit the transaction. This
9268-
* requires writing out any dirty buffers and then doing a forced fsync.
9269-
*
9270-
* Indexes are not touched. (Currently, index operations associated with
9271-
* the commands that use this are WAL-logged and so do not need fsync.
9272-
* That behavior might change someday, but in any case it's likely that
9273-
* any fsync decisions required would be per-index and hence not appropriate
9274-
* to be done here.)
9256+
*heap_sync- for binary compatibility
9257+
*
9258+
* A newer PostgreSQL version removes this function. It exists here just in
9259+
* case an extension calls it. See "Skipping WAL for New RelFileNode" in
9260+
* src/backend/access/transam/README for the system that superseded it,
9261+
* allowing removal of most calls. Cases like copy_relation_data() should
9262+
* call smgrimmedsync() directly.
92759263
*/
92769264
void
92779265
heap_sync(Relationrel)

‎src/backend/access/heap/rewriteheap.c

Lines changed: 5 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,6 @@ typedef struct RewriteStateData
145145
Pagers_buffer;/* page currently being built */
146146
BlockNumberrs_blockno;/* block where page will go */
147147
boolrs_buffer_valid;/* T if any tuples in buffer */
148-
boolrs_use_wal;/* must we WAL-log inserts? */
149148
boolrs_logical_rewrite;/* do we need to do logical rewriting */
150149
TransactionIdrs_oldest_xmin;/* oldest xmin used by caller to determine
151150
* tuple visibility */
@@ -239,15 +238,13 @@ static void logical_end_heap_rewrite(RewriteState state);
239238
* oldest_xminxid used by the caller to determine which tuples are dead
240239
* freeze_xidxid before which tuples will be frozen
241240
* min_multimultixact before which multis will be removed
242-
* use_walshould the inserts to the new heap be WAL-logged?
243241
*
244242
* Returns an opaque RewriteState, allocated in current memory context,
245243
* to be used in subsequent calls to the other functions.
246244
*/
247245
RewriteState
248246
begin_heap_rewrite(Relationold_heap,Relationnew_heap,TransactionIdoldest_xmin,
249-
TransactionIdfreeze_xid,MultiXactIdcutoff_multi,
250-
booluse_wal)
247+
TransactionIdfreeze_xid,MultiXactIdcutoff_multi)
251248
{
252249
RewriteStatestate;
253250
MemoryContextrw_cxt;
@@ -272,7 +269,6 @@ begin_heap_rewrite(Relation old_heap, Relation new_heap, TransactionId oldest_xm
272269
/* new_heap needn't be empty, just locked */
273270
state->rs_blockno=RelationGetNumberOfBlocks(new_heap);
274271
state->rs_buffer_valid= false;
275-
state->rs_use_wal=use_wal;
276272
state->rs_oldest_xmin=oldest_xmin;
277273
state->rs_freeze_xid=freeze_xid;
278274
state->rs_cutoff_multi=cutoff_multi;
@@ -331,7 +327,7 @@ end_heap_rewrite(RewriteState state)
331327
/* Write the last page, if any */
332328
if (state->rs_buffer_valid)
333329
{
334-
if (state->rs_use_wal)
330+
if (RelationNeedsWAL(state->rs_new_rel))
335331
log_newpage(&state->rs_new_rel->rd_node,
336332
MAIN_FORKNUM,
337333
state->rs_blockno,
@@ -346,18 +342,14 @@ end_heap_rewrite(RewriteState state)
346342
}
347343

348344
/*
349-
* If the rel is WAL-logged, must fsync before commit. We use heap_sync
350-
* to ensure that the toast table gets fsync'd too.
351-
*
352-
* It's obvious that we must do this when not WAL-logging. It's less
353-
* obvious that we have to do it even if we did WAL-log the pages. The
345+
* When we WAL-logged rel pages, we must nonetheless fsync them. The
354346
* reason is the same as in tablecmds.c's copy_relation_data(): we're
355347
* writing data that's not in shared buffers, and so a CHECKPOINT
356348
* occurring during the rewriteheap operation won't have fsync'd data we
357349
* wrote before the checkpoint.
358350
*/
359351
if (RelationNeedsWAL(state->rs_new_rel))
360-
heap_sync(state->rs_new_rel);
352+
smgrimmedsync(state->rs_new_rel->rd_smgr,MAIN_FORKNUM);
361353

362354
logical_end_heap_rewrite(state);
363355

@@ -654,9 +646,6 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
654646
{
655647
intoptions=HEAP_INSERT_SKIP_FSM;
656648

657-
if (!state->rs_use_wal)
658-
options |=HEAP_INSERT_SKIP_WAL;
659-
660649
/*
661650
* While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
662651
* for the TOAST table are not logically decoded. The main heap is
@@ -695,7 +684,7 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
695684
/* Doesn't fit, so write out the existing page */
696685

697686
/* XLOG stuff */
698-
if (state->rs_use_wal)
687+
if (RelationNeedsWAL(state->rs_new_rel))
699688
log_newpage(&state->rs_new_rel->rd_node,
700689
MAIN_FORKNUM,
701690
state->rs_blockno,

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp