Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit59cea09

Browse files
committed
Fix race leading to incorrect conflict cause in InvalidatePossiblyObsoleteSlot()
The invalidation of an active slot is done in two steps:- Termination of the backend holding it, if any.- Report that the slot is obsolete, with a conflict cause depending onthe slot's data.This can be racy because between these two steps the slot mutex would bereleased while doing system calls, which means that the effective_xminand effective_catalog_xmin could advance during that time, detecting aconflict cause different than the one originally wanted before theprocess owning a slot is terminated.Holding the mutex longer is not an option, so this commit changes thecode to record the LSNs stored in the slot during the termination of theprocess owning the slot.Bonus thanks to Alexander Lakhin for the various tests and the analysis.Author: Bertrand DrouvotReviewed-by: Michael Paquier, Bharath RupireddyDiscussion:https://postgr.es/m/ZaTjW2Xh+TQUCOH0@ip-10-97-1-34.eu-west-3.compute.internalBackpatch-through: 16
1 parentec4fcf4 commit59cea09

File tree

1 file changed

+33
-6
lines changed

1 file changed

+33
-6
lines changed

‎src/backend/replication/slot.c

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1321,6 +1321,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
13211321
{
13221322
intlast_signaled_pid=0;
13231323
boolreleased_lock= false;
1324+
boolterminated= false;
1325+
XLogRecPtrinitial_effective_xmin=InvalidXLogRecPtr;
1326+
XLogRecPtrinitial_catalog_effective_xmin=InvalidXLogRecPtr;
1327+
XLogRecPtrinitial_restart_lsn=InvalidXLogRecPtr;
1328+
ReplicationSlotInvalidationCauseconflict_prevPG_USED_FOR_ASSERTS_ONLY=RS_INVAL_NONE;
13241329

13251330
for (;;)
13261331
{
@@ -1355,11 +1360,24 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
13551360
*/
13561361
if (s->data.invalidated==RS_INVAL_NONE)
13571362
{
1363+
/*
1364+
* The slot's mutex will be released soon, and it is possible that
1365+
* those values change since the process holding the slot has been
1366+
* terminated (if any), so record them here to ensure that we
1367+
* would report the correct conflict cause.
1368+
*/
1369+
if (!terminated)
1370+
{
1371+
initial_restart_lsn=s->data.restart_lsn;
1372+
initial_effective_xmin=s->effective_xmin;
1373+
initial_catalog_effective_xmin=s->effective_catalog_xmin;
1374+
}
1375+
13581376
switch (cause)
13591377
{
13601378
caseRS_INVAL_WAL_REMOVED:
1361-
if (s->data.restart_lsn!=InvalidXLogRecPtr&&
1362-
s->data.restart_lsn<oldestLSN)
1379+
if (initial_restart_lsn!=InvalidXLogRecPtr&&
1380+
initial_restart_lsn<oldestLSN)
13631381
conflict=cause;
13641382
break;
13651383
caseRS_INVAL_HORIZON:
@@ -1368,12 +1386,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
13681386
/* invalid DB oid signals a shared relation */
13691387
if (dboid!=InvalidOid&&dboid!=s->data.database)
13701388
break;
1371-
if (TransactionIdIsValid(s->effective_xmin)&&
1372-
TransactionIdPrecedesOrEquals(s->effective_xmin,
1389+
if (TransactionIdIsValid(initial_effective_xmin)&&
1390+
TransactionIdPrecedesOrEquals(initial_effective_xmin,
13731391
snapshotConflictHorizon))
13741392
conflict=cause;
1375-
elseif (TransactionIdIsValid(s->effective_catalog_xmin)&&
1376-
TransactionIdPrecedesOrEquals(s->effective_catalog_xmin,
1393+
elseif (TransactionIdIsValid(initial_catalog_effective_xmin)&&
1394+
TransactionIdPrecedesOrEquals(initial_catalog_effective_xmin,
13771395
snapshotConflictHorizon))
13781396
conflict=cause;
13791397
break;
@@ -1386,6 +1404,13 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
13861404
}
13871405
}
13881406

1407+
/*
1408+
* The conflict cause recorded previously should not change while the
1409+
* process owning the slot (if any) has been terminated.
1410+
*/
1411+
Assert(!(conflict_prev!=RS_INVAL_NONE&&terminated&&
1412+
conflict_prev!=conflict));
1413+
13891414
/* if there's no conflict, we're done */
13901415
if (conflict==RS_INVAL_NONE)
13911416
{
@@ -1460,6 +1485,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
14601485
(void)kill(active_pid,SIGTERM);
14611486

14621487
last_signaled_pid=active_pid;
1488+
terminated= true;
1489+
conflict_prev=conflict;
14631490
}
14641491

14651492
/* Wait until the slot is released. */

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp