- Notifications
You must be signed in to change notification settings - Fork4.9k
Commitbd7c348
committed
Rework the way multixact truncations work.
The fact that multixact truncations are not WAL logged has caused a fairshare of problems. Amongst others it requires to do computations duringrecovery while the database is not in a consistent state, delayingtruncations till checkpoints, and handling members being truncated, butoffset not.We tried to put bandaids on lots of these issues over the last years,but it seems time to change course. Thus this patch introduces WALlogging for multixact truncations.This allows:1) to perform the truncation directly during VACUUM, instead of delaying it to the checkpoint.2) to avoid looking at the offsets SLRU for truncation during recovery, we can just use the master's values.3) simplify a fair amount of logic to keep in memory limits straight, this has gotten much easierDuring the course of fixing this a bunch of additional bugs had to befixed:1) Data was not purged from memory the member's SLRU before deleting segments. This happened to be hard or impossible to hit due to the interlock between checkpoints and truncation.2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but that doesn't work for offsets that haven't yet been flushed to disk. Add code to flush the SLRUs to fix. Not pretty, but it feels slightly safer to only make decisions based on actual on-disk state.3) find_multixact_start() could be called concurrently with a truncation and thus fail. Via SetOffsetVacuumLimit() that could lead to a round of emergency vacuuming. The problem remains in pg_get_multixact_members(), but that's quite harmless.For now this is going to only get applied to 9.5+, leaving the issues inthe older branches in place. It is quite possible that we need tobackpatch at a later point though.For the case this gets backpatched we need to handle that an updatedstandby may be replaying WAL from a not-yet upgraded primary. We have torecognize that situation and use "old style" truncation (i.e. looking atthe SLRUs) during WAL replay. In contrast to before, this now happens inthe startup process, when replaying a checkpoint record, instead of thecheckpointer. Doing truncation in the restartpoint is incorrect, theycan happen much later than the original checkpoint, thereby leading towraparound. To avoid "multixact_redo: unknown op code 48" errorsstandbys would have to be upgraded before primaries.A later patch will bump the WAL page magic, and remove the legacytruncation codepaths. Legacy truncation support is just included to makea possible future backpatch easier.Discussion: 20150621192409.GA4797@alap3.anarazel.deReviewed-By: Robert Haas, Alvaro Herrera, Thomas MunroBackpatch: 9.5 for now1 parentc9645f7 commitbd7c348
File tree
9 files changed
+539
-345
lines changed- src
- backend
- access
- rmgrdesc
- transam
- commands
- include
- access
- storage
- tools/pgindent
9 files changed
+539
-345
lines changedLines changed: 11 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
70 | 70 |
| |
71 | 71 |
| |
72 | 72 |
| |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
73 | 81 |
| |
74 | 82 |
| |
75 | 83 |
| |
| |||
88 | 96 |
| |
89 | 97 |
| |
90 | 98 |
| |
| 99 | + | |
| 100 | + | |
| 101 | + | |
91 | 102 |
| |
92 | 103 |
| |
93 | 104 |
| |
|
0 commit comments
Comments
(0)