Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitf741300

Browse files
committed
Have multixact be truncated by checkpoint, not vacuum
Instead of truncating pg_multixact at vacuum time, do it only atcheckpoint time. The reason for doing it this way is twofold: first, wewant it to delete only segments that we're certain will not be requiredif there's a crash immediately after the removal; and second, we want todo it relatively often so that older files are not left behind ifthere's an untimely crash.Per my proposal inhttp://www.postgresql.org/message-id/20140626044519.GJ7340@eldon.alvh.no-ip.orgwe now execute the truncation in the checkpointer process rather than aspart of vacuum. Vacuum is in only charge of maintaining in sharedmemory the value to which it's possible to truncate the files; thatvalue is stored as part of checkpoints also, and so upon recovery we canreuse the same value to re-execute truncate and reset theoldest-value-still-safe-to-use to one known to remain after truncation.Per bug reported by Jeff Janes in the course of his tests involvingbug #8673.While at it, update some comments that hadn't been updated sincemultixacts were changed.Backpatch to 9.3, where persistency of pg_multixact files wasintroduced by commit0ac5ad5.
1 parentb7e51d9 commitf741300

File tree

4 files changed

+112
-60
lines changed

4 files changed

+112
-60
lines changed

‎src/backend/access/transam/multixact.c

Lines changed: 76 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,17 @@
4545
* anything we saw during replay.
4646
*
4747
* We are able to remove segments no longer necessary by carefully tracking
48-
* each table's used values: during vacuum, any multixact older than a
49-
* certain value is removed; the cutoff value is stored in pg_class.
50-
* The minimum value in each database is stored in pg_database, and the
51-
* global minimum is part of pg_control. Any vacuum that is able to
52-
* advance its database's minimum value also computes a new global minimum,
53-
* and uses this value to truncate older segments. When new multixactid
54-
* values are to be created, care is taken that the counter does not
55-
* fall within the wraparound horizon considering the global minimum value.
48+
* each table's used values: during vacuum, any multixact older than a certain
49+
* value is removed; the cutoff value is stored in pg_class. The minimum value
50+
* across all tables in each database is stored in pg_database, and the global
51+
* minimum across all databases is part of pg_control and is kept in shared
52+
* memory. At checkpoint time, after the value is known flushed in WAL, any
53+
* files that correspond to multixacts older than that value are removed.
54+
* (These files are also removed when a restartpoint is executed.)
55+
*
56+
* When new multixactid values are to be created, care is taken that the
57+
* counter does not fall within the wraparound horizon considering the global
58+
* minimum value.
5659
*
5760
* Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
5861
* Portions Copyright (c) 1994, Regents of the University of California
@@ -91,7 +94,7 @@
9194
* Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
9295
* MultiXact page numbering also wraps around at
9396
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
94-
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_SEGMENTS_PER_PAGE. We need
97+
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
9598
* take no explicit notice of that fact in this module, except when comparing
9699
* segment and page numbers in TruncateMultiXact (see
97100
* MultiXactOffsetPagePrecedes).
@@ -188,16 +191,20 @@ typedef struct MultiXactStateData
188191
/* next-to-be-assigned offset */
189192
MultiXactOffsetnextOffset;
190193

191-
/* the Offset SLRU area was last truncated at this MultiXactId */
192-
MultiXactIdlastTruncationPoint;
193-
194194
/*
195-
*oldest multixact that is still on disk. Anything older than this
196-
* should not be consulted.
195+
*Oldest multixact that is still on disk. Anything older than this
196+
* should not be consulted. These values are updated by vacuum.
197197
*/
198198
MultiXactIdoldestMultiXactId;
199199
OidoldestMultiXactDB;
200200

201+
/*
202+
* This is what the previous checkpoint stored as the truncate position.
203+
* This value is the oldestMultiXactId that was valid when a checkpoint
204+
* was last executed.
205+
*/
206+
MultiXactIdlastCheckpointedOldest;
207+
201208
/* support for anti-wraparound measures */
202209
MultiXactIdmultiVacLimit;
203210
MultiXactIdmultiWarnLimit;
@@ -234,12 +241,20 @@ typedef struct MultiXactStateData
234241
* than its own OldestVisibleMXactId[] setting; this is necessary because
235242
* the checkpointer could truncate away such data at any instant.
236243
*
237-
* The checkpointer can compute the safe truncation point as the oldest
238-
* valid value among all the OldestMemberMXactId[] and
239-
* OldestVisibleMXactId[] entries, or nextMXact if none are valid.
240-
* Clearly, it is not possible for any later-computed OldestVisibleMXactId
241-
* value to be older than this, and so there is no risk of truncating data
242-
* that is still needed.
244+
* The oldest valid value among all of the OldestMemberMXactId[] and
245+
* OldestVisibleMXactId[] entries is considered by vacuum as the earliest
246+
* possible value still having any live member transaction. Subtracting
247+
* vacuum_multixact_freeze_min_age from that value we obtain the freezing
248+
* point for multixacts for that table. Any value older than that is
249+
* removed from tuple headers (or "frozen"; see FreezeMultiXactId. Note
250+
* that multis that have member xids that are older than the cutoff point
251+
* for xids must also be frozen, even if the multis themselves are newer
252+
* than the multixid cutoff point). Whenever a full table vacuum happens,
253+
* the freezing point so computed is used as the new pg_class.relminmxid
254+
* value. The minimum of all those values in a database is stored as
255+
* pg_database.datminmxid. In turn, the minimum of all of those values is
256+
* stored in pg_control and used as truncation point for pg_multixact. At
257+
* checkpoint or restartpoint, unneeded segments are removed.
243258
*/
244259
MultiXactIdperBackendXactIds[1];/* VARIABLE LENGTH ARRAY */
245260
}MultiXactStateData;
@@ -1121,8 +1136,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
11211136
* We check known limits on MultiXact before resorting to the SLRU area.
11221137
*
11231138
* An ID older than MultiXactState->oldestMultiXactId cannot possibly be
1124-
* useful; itshould havealready been removed by vacuum. We've truncated
1125-
*the on-disk structures anyway. Returning the wrong values could lead
1139+
* useful; ithasalready been removed, or will be removed shortly, by
1140+
*truncation. Returning the wrong values could lead
11261141
* to an incorrect visibility result. However, to support pg_upgrade we
11271142
* need to allow an empty set to be returned regardless, if the caller is
11281143
* willing to accept it; the caller is expected to check that it's an
@@ -1932,14 +1947,14 @@ TrimMultiXact(void)
19321947
LWLockAcquire(MultiXactOffsetControlLock,LW_EXCLUSIVE);
19331948

19341949
/*
1935-
* (Re-)Initialize our idea of the latest page number.
1950+
* (Re-)Initialize our idea of the latest page number for offsets.
19361951
*/
19371952
pageno=MultiXactIdToOffsetPage(multi);
19381953
MultiXactOffsetCtl->shared->latest_page_number=pageno;
19391954

19401955
/*
19411956
* Zero out the remainder of the current offsets page. See notes in
1942-
*StartupCLOG() for motivation.
1957+
*TrimCLOG() for motivation.
19431958
*/
19441959
entryno=MultiXactIdToOffsetEntry(multi);
19451960
if (entryno!=0)
@@ -1962,7 +1977,7 @@ TrimMultiXact(void)
19621977
LWLockAcquire(MultiXactMemberControlLock,LW_EXCLUSIVE);
19631978

19641979
/*
1965-
* (Re-)Initialize our idea of the latest page number.
1980+
* (Re-)Initialize our idea of the latest page number for members.
19661981
*/
19671982
pageno=MXOffsetToMemberPage(offset);
19681983
MultiXactMemberCtl->shared->latest_page_number=pageno;
@@ -2240,6 +2255,18 @@ MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB)
22402255
SetMultiXactIdLimit(oldestMulti,oldestMultiDB);
22412256
}
22422257

2258+
/*
2259+
* Update the "safe truncation point". This is the newest value of oldestMulti
2260+
* that is known to be flushed as part of a checkpoint record.
2261+
*/
2262+
void
2263+
MultiXactSetSafeTruncate(MultiXactIdsafeTruncateMulti)
2264+
{
2265+
LWLockAcquire(MultiXactGenLock,LW_EXCLUSIVE);
2266+
MultiXactState->lastCheckpointedOldest=safeTruncateMulti;
2267+
LWLockRelease(MultiXactGenLock);
2268+
}
2269+
22432270
/*
22442271
* Make sure that MultiXactOffset has room for a newly-allocated MultiXactId.
22452272
*
@@ -2478,25 +2505,31 @@ SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
24782505
* Remove all MultiXactOffset and MultiXactMember segments before the oldest
24792506
* ones still of interest.
24802507
*
2481-
* On a primary, this is called by vacuum after it has successfully advanced a
2482-
* database's datminmxid value; the cutoff value we're passed is the minimum of
2483-
* all databases' datminmxid values.
2484-
*
2485-
* During crash recovery, it's called from CreateRestartPoint() instead. We
2486-
* rely on the fact that xlog_redo() will already have called
2487-
* MultiXactAdvanceOldest(). Our latest_page_number will already have been
2488-
* initialized by StartupMultiXact() and kept up to date as new pages are
2489-
* zeroed.
2508+
* On a primary, this is called by the checkpointer process after a checkpoint
2509+
* has been flushed; during crash recovery, it's called from
2510+
* CreateRestartPoint(). In the latter case, we rely on the fact that
2511+
* xlog_redo() will already have called MultiXactAdvanceOldest(). Our
2512+
* latest_page_number will already have been initialized by StartupMultiXact()
2513+
* and kept up to date as new pages are zeroed.
24902514
*/
24912515
void
2492-
TruncateMultiXact(MultiXactIdoldestMXact)
2516+
TruncateMultiXact(void)
24932517
{
2518+
MultiXactIdoldestMXact;
24942519
MultiXactOffsetoldestOffset;
24952520
MultiXactOffsetnextOffset;
24962521
mxtruncinfotrunc;
24972522
MultiXactIdearliest;
24982523
MembersLiveRangerange;
24992524

2525+
Assert(AmCheckpointerProcess()||AmStartupProcess()||
2526+
!IsPostmasterEnvironment);
2527+
2528+
LWLockAcquire(MultiXactGenLock,LW_SHARED);
2529+
oldestMXact=MultiXactState->lastCheckpointedOldest;
2530+
LWLockRelease(MultiXactGenLock);
2531+
Assert(MultiXactIdIsValid(oldestMXact));
2532+
25002533
/*
25012534
* Note we can't just plow ahead with the truncation; it's possible that
25022535
* there are no segments to truncate, which is a problem because we are
@@ -2507,15 +2540,16 @@ TruncateMultiXact(MultiXactId oldestMXact)
25072540
trunc.earliestExistingPage=-1;
25082541
SlruScanDirectory(MultiXactOffsetCtl,SlruScanDirCbFindEarliest,&trunc);
25092542
earliest=trunc.earliestExistingPage*MULTIXACT_OFFSETS_PER_PAGE;
2543+
if (earliest<FirstMultiXactId)
2544+
earliest=FirstMultiXactId;
25102545

25112546
/* nothing to do */
25122547
if (MultiXactIdPrecedes(oldestMXact,earliest))
25132548
return;
25142549

25152550
/*
25162551
* First, compute the safe truncation point for MultiXactMember. This is
2517-
* the starting offset of the multixact we were passed as MultiXactOffset
2518-
* cutoff.
2552+
* the starting offset of the oldest multixact.
25192553
*/
25202554
{
25212555
intpageno;
@@ -2538,10 +2572,6 @@ TruncateMultiXact(MultiXactId oldestMXact)
25382572
LWLockRelease(MultiXactOffsetControlLock);
25392573
}
25402574

2541-
/* truncate MultiXactOffset */
2542-
SimpleLruTruncate(MultiXactOffsetCtl,
2543-
MultiXactIdToOffsetPage(oldestMXact));
2544-
25452575
/*
25462576
* To truncate MultiXactMembers, we need to figure out the active page
25472577
* range and delete all files outside that range. The start point is the
@@ -2559,6 +2589,11 @@ TruncateMultiXact(MultiXactId oldestMXact)
25592589
range.rangeEnd=MXOffsetToMemberPage(nextOffset);
25602590

25612591
SlruScanDirectory(MultiXactMemberCtl,SlruScanDirCbRemoveMembers,&range);
2592+
2593+
/* Now we can truncate MultiXactOffset */
2594+
SimpleLruTruncate(MultiXactOffsetCtl,
2595+
MultiXactIdToOffsetPage(oldestMXact));
2596+
25622597
}
25632598

25642599
/*

‎src/backend/access/transam/xlog.c

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6264,6 +6264,7 @@ StartupXLOG(void)
62646264
MultiXactSetNextMXact(checkPoint.nextMulti,checkPoint.nextMultiOffset);
62656265
SetTransactionIdLimit(checkPoint.oldestXid,checkPoint.oldestXidDB);
62666266
SetMultiXactIdLimit(checkPoint.oldestMulti,checkPoint.oldestMultiDB);
6267+
MultiXactSetSafeTruncate(checkPoint.oldestMulti);
62676268
XLogCtl->ckptXidEpoch=checkPoint.nextXidEpoch;
62686269
XLogCtl->ckptXid=checkPoint.nextXid;
62696270

@@ -8272,6 +8273,12 @@ CreateCheckPoint(int flags)
82728273
*/
82738274
END_CRIT_SECTION();
82748275

8276+
/*
8277+
* Now that the checkpoint is safely on disk, we can update the point to
8278+
* which multixact can be truncated.
8279+
*/
8280+
MultiXactSetSafeTruncate(checkPoint.oldestMulti);
8281+
82758282
/*
82768283
* Let smgr do post-checkpoint cleanup (eg, deleting old files).
82778284
*/
@@ -8305,6 +8312,11 @@ CreateCheckPoint(int flags)
83058312
if (!RecoveryInProgress())
83068313
TruncateSUBTRANS(GetOldestXmin(NULL, false));
83078314

8315+
/*
8316+
* Truncate pg_multixact too.
8317+
*/
8318+
TruncateMultiXact();
8319+
83088320
/* Real work is done, but log and update stats before releasing lock. */
83098321
LogCheckpointEnd(false);
83108322

@@ -8578,21 +8590,6 @@ CreateRestartPoint(int flags)
85788590
}
85798591
LWLockRelease(ControlFileLock);
85808592

8581-
/*
8582-
* Due to an historical accident multixact truncations are not WAL-logged,
8583-
* but just performed everytime the mxact horizon is increased. So, unless
8584-
* we explicitly execute truncations on a standby it will never clean out
8585-
* /pg_multixact which obviously is bad, both because it uses space and
8586-
* because we can wrap around into pre-existing data...
8587-
*
8588-
* We can only do the truncation here, after the UpdateControlFile()
8589-
* above, because we've now safely established a restart point, that
8590-
* guarantees we will not need need to access those multis.
8591-
*
8592-
* It's probably worth improving this.
8593-
*/
8594-
TruncateMultiXact(lastCheckPoint.oldestMulti);
8595-
85968593
/*
85978594
* Delete old log files (those no longer needed even for previous
85988595
* checkpoint/restartpoint) to prevent the disk holding the xlog from
@@ -8651,6 +8648,21 @@ CreateRestartPoint(int flags)
86518648
ThisTimeLineID=0;
86528649
}
86538650

8651+
/*
8652+
* Due to an historical accident multixact truncations are not WAL-logged,
8653+
* but just performed everytime the mxact horizon is increased. So, unless
8654+
* we explicitly execute truncations on a standby it will never clean out
8655+
* /pg_multixact which obviously is bad, both because it uses space and
8656+
* because we can wrap around into pre-existing data...
8657+
*
8658+
* We can only do the truncation here, after the UpdateControlFile()
8659+
* above, because we've now safely established a restart point. That
8660+
* guarantees we will not need to access those multis.
8661+
*
8662+
* It's probably worth improving this.
8663+
*/
8664+
TruncateMultiXact();
8665+
86548666
/*
86558667
* Truncate pg_subtrans if possible. We can throw away all data before
86568668
* the oldest XMIN of any running transaction. No future transaction will
@@ -9117,6 +9129,7 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
91179129
checkPoint.nextMultiOffset);
91189130
SetTransactionIdLimit(checkPoint.oldestXid,checkPoint.oldestXidDB);
91199131
SetMultiXactIdLimit(checkPoint.oldestMulti,checkPoint.oldestMultiDB);
9132+
MultiXactSetSafeTruncate(checkPoint.oldestMulti);
91209133

91219134
/*
91229135
* If we see a shutdown checkpoint while waiting for an end-of-backup
@@ -9217,6 +9230,7 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
92179230
checkPoint.oldestXidDB);
92189231
MultiXactAdvanceOldest(checkPoint.oldestMulti,
92199232
checkPoint.oldestMultiDB);
9233+
MultiXactSetSafeTruncate(checkPoint.oldestMulti);
92209234

92219235
/* ControlFile->checkPointCopy always tracks the latest ckpt XID */
92229236
ControlFile->checkPointCopy.nextXidEpoch=checkPoint.nextXidEpoch;

‎src/backend/commands/vacuum.c

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -969,9 +969,11 @@ vac_truncate_clog(TransactionId frozenXID, MultiXactId minMulti)
969969
return;
970970
}
971971

972-
/* Truncate CLOG and Multi to the oldest computed value */
972+
/*
973+
* Truncate CLOG to the oldest computed value. Note we don't truncate
974+
* multixacts; that will be done by the next checkpoint.
975+
*/
973976
TruncateCLOG(frozenXID);
974-
TruncateMultiXact(minMulti);
975977

976978
/*
977979
* Update the wrap limit for GetNewTransactionId and creation of new
@@ -980,7 +982,7 @@ vac_truncate_clog(TransactionId frozenXID, MultiXactId minMulti)
980982
* signalling twice?
981983
*/
982984
SetTransactionIdLimit(frozenXID,oldestxid_datoid);
983-
MultiXactAdvanceOldest(minMulti,minmulti_datoid);
985+
SetMultiXactIdLimit(minMulti,minmulti_datoid);
984986
}
985987

986988

‎src/include/access/multixact.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,12 +119,13 @@ extern void MultiXactGetCheckptMulti(bool is_shutdown,
119119
Oid*oldestMultiDB);
120120
externvoidCheckPointMultiXact(void);
121121
externMultiXactIdGetOldestMultiXactId(void);
122-
externvoidTruncateMultiXact(MultiXactIdcutoff_multi);
122+
externvoidTruncateMultiXact(void);
123123
externvoidMultiXactSetNextMXact(MultiXactIdnextMulti,
124124
MultiXactOffsetnextMultiOffset);
125125
externvoidMultiXactAdvanceNextMXact(MultiXactIdminMulti,
126126
MultiXactOffsetminMultiOffset);
127127
externvoidMultiXactAdvanceOldest(MultiXactIdoldestMulti,OidoldestMultiDB);
128+
externvoidMultiXactSetSafeTruncate(MultiXactIdsafeTruncateMulti);
128129

129130
externvoidmultixact_twophase_recover(TransactionIdxid,uint16info,
130131
void*recdata,uint32len);

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp