Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit8a90620

Browse files
committed
Fix race condition during replication origin drop.
replorigin_drop() misunderstood the API for condition variables: ithad ConditionVariablePrepareToSleep and ConditionVariableCancelSleepinside its test-and-sleep loop, rather than outside the loop asintended. The net effect is a narrow race-condition window wherein,if the process using a replication slot releases it immediately afterreplorigin_drop() releases the ReplicationOriginLock, replorigin_drop()would get into the condition variable's wait list too late and thenwait indefinitely for a signal that won't come.Because there's a different CV for each replication slot, we can'tjust move the ConditionVariablePrepareToSleep call to above thetest-and-sleep loop. What we can do, in the wake of commit13db3b9,is drop the ConditionVariablePrepareToSleep call entirely. This fixdepends on that commit because (at least in principle) the slot matchingthe target replication origin might move around, so that once in a bluemoon successive loop iterations might involve different CVs. We can nowcope with such a scenario, at the cost of an extra trip through theretry loop.(There are ways we could fix this bug without depending on that commit,but they're all a lot more complicated than this way.)While at it, upgrade the rather skimpy comments in this function.Back-patch to v10 where this code came in.Discussion:https://postgr.es/m/19947.1515455433@sss.pgh.pa.us
1 parent13db3b9 commit8a90620

File tree

1 file changed

+23
-6
lines changed

1 file changed

+23
-6
lines changed

‎src/backend/replication/logical/origin.c

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -339,20 +339,26 @@ replorigin_drop(RepOriginId roident, bool nowait)
339339

340340
Assert(IsTransactionState());
341341

342+
/*
343+
* To interlock against concurrent drops, we hold ExclusiveLock on
344+
* pg_replication_origin throughout this funcion.
345+
*/
342346
rel=heap_open(ReplicationOriginRelationId,ExclusiveLock);
343347

348+
/*
349+
* First, clean up the slot state info, if there is any matching slot.
350+
*/
344351
restart:
345352
tuple=NULL;
346-
/* cleanup the slot state info */
347353
LWLockAcquire(ReplicationOriginLock,LW_EXCLUSIVE);
348354

349355
for (i=0;i<max_replication_slots;i++)
350356
{
351357
ReplicationState*state=&replication_states[i];
352358

353-
/* found our slot */
354359
if (state->roident==roident)
355360
{
361+
/* found our slot, is it busy? */
356362
if (state->acquired_by!=0)
357363
{
358364
ConditionVariable*cv;
@@ -363,16 +369,23 @@ replorigin_drop(RepOriginId roident, bool nowait)
363369
errmsg("could not drop replication origin with OID %d, in use by PID %d",
364370
state->roident,
365371
state->acquired_by)));
372+
373+
/*
374+
* We must wait and then retry. Since we don't know which CV
375+
* to wait on until here, we can't readily use
376+
* ConditionVariablePrepareToSleep (calling it here would be
377+
* wrong, since we could miss the signal if we did so); just
378+
* use ConditionVariableSleep directly.
379+
*/
366380
cv=&state->origin_cv;
367381

368382
LWLockRelease(ReplicationOriginLock);
369-
ConditionVariablePrepareToSleep(cv);
383+
370384
ConditionVariableSleep(cv,WAIT_EVENT_REPLICATION_ORIGIN_DROP);
371-
ConditionVariableCancelSleep();
372385
gotorestart;
373386
}
374387

375-
/* first WAL log */
388+
/* firstmake aWAL log entry */
376389
{
377390
xl_replorigin_dropxlrec;
378391

@@ -382,15 +395,19 @@ replorigin_drop(RepOriginId roident, bool nowait)
382395
XLogInsert(RM_REPLORIGIN_ID,XLOG_REPLORIGIN_DROP);
383396
}
384397

385-
/* thenreset the in-memoryentry */
398+
/* thenclear the in-memoryslot */
386399
state->roident=InvalidRepOriginId;
387400
state->remote_lsn=InvalidXLogRecPtr;
388401
state->local_lsn=InvalidXLogRecPtr;
389402
break;
390403
}
391404
}
392405
LWLockRelease(ReplicationOriginLock);
406+
ConditionVariableCancelSleep();
393407

408+
/*
409+
* Now, we can delete the catalog entry.
410+
*/
394411
tuple=SearchSysCache1(REPLORIGIDENT,ObjectIdGetDatum(roident));
395412
if (!HeapTupleIsValid(tuple))
396413
elog(ERROR,"cache lookup failed for replication origin with oid %u",

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp