Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit873aff9

Browse files
committed
Fix race with synchronous_standby_names at startup
synchronous_standby_names cannot be reloaded safely by backends, and thecheckpointer is in charge of updating a state in shared memory if theGUC is enabled in WalSndCtl, to let the backends know if they shouldwait or not for a given LSN. This provides a strict control on thetiming of the waiting queues if the GUC is enabled or disabled, thenreloaded. The checkpointer is also in charge of waking up the backendsthat could be waiting for a LSN when the GUC is disabled.This logic had a race condition at startup, where it would be possiblefor backends to not wait for a LSN even if synchronous_standby_names isenabled. This would cause visibility issues with transactions that weshould be waiting for but they were not. The problem lasts until thecheckpointer does its initial update of the shared memory state when itloads synchronous_standby_names.In order to take care of this problem, the shared memory state inWalSndCtl is extended to detect if it has been initialized by thecheckpointer, and not only check if synchronous_standby_names isdefined. In WalSndCtlData, sync_standbys_defined is renamed tosync_standbys_status, a bits8 able to know about two states:- If the shared memory state has been initialized. This flag is set bythe checkpointer at startup once, and never removed.- If synchronous_standby_names is known as defined in the shared memorystate. This is the same as the previous sync_standbys_defined inWalSndCtl.This method gives a way for backends to decide what they should do untilthe shared memory area is initialized, and they now ultimately fall backto a check on the GUC value in this case, which is the best thing thatcan be done.Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately bythe checkpointer when this process starts, so the window is very narrow.It is possible to enlarge the problematic window by making thecheckpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined()with a hardcoded sleep for example, and doing so has showed that a 2PCvisibility test is indeed failing. On machines slow enough, this bugwould cause spurious failures.In 17~, we have looked at the possibility of adding an injection pointto have a reproducible test, but as the problematic window happens atearly startup, we would need to invent a way to make an injection pointoptionally persistent across restarts when attached, something thatwould be fine for this case as it would involve the checkpointer. Thisissue is quite old, and can be reproduced on all the stable branches.Author: Melnikov Maksim <m.melnikov@postgrespro.ru>Co-authored-by: Michael Paquier <michael@paquier.xyz>Discussion:https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ruBackpatch-through: 13
1 parent95e8385 commit873aff9

File tree

2 files changed

+101
-19
lines changed

2 files changed

+101
-19
lines changed

‎src/backend/replication/syncrep.c

Lines changed: 82 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -163,16 +163,23 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
163163
* sync replication standby names defined.
164164
*
165165
* Since this routine gets called every commit time, it's important to
166-
* exit quickly if sync replication is not requested. So we check
167-
* WalSndCtl->sync_standbys_defined flag without the lock and exit
168-
* immediately if it's false. If it's true, we need to check it again
169-
* later while holding the lock, to check the flag and operate the sync
170-
* rep queue atomically. This is necessary to avoid the race condition
171-
* described in SyncRepUpdateSyncStandbysDefined(). On the other hand, if
172-
* it's false, the lock is not necessary because we don't touch the queue.
166+
* exit quickly if sync replication is not requested.
167+
*
168+
* We check WalSndCtl->sync_standbys_status flag without the lock and exit
169+
* immediately if SYNC_STANDBY_INIT is set (the checkpointer has
170+
* initialized this data) but SYNC_STANDBY_DEFINED is missing (no sync
171+
* replication requested).
172+
*
173+
* If SYNC_STANDBY_DEFINED is set, we need to check the status again later
174+
* while holding the lock, to check the flag and operate the sync rep
175+
* queue atomically. This is necessary to avoid the race condition
176+
* described in SyncRepUpdateSyncStandbysDefined(). On the other hand, if
177+
* SYNC_STANDBY_DEFINED is not set, the lock is not necessary because we
178+
* don't touch the queue.
173179
*/
174180
if (!SyncRepRequested()||
175-
!((volatileWalSndCtlData*)WalSndCtl)->sync_standbys_defined)
181+
((((volatileWalSndCtlData*)WalSndCtl)->sync_standbys_status)&
182+
(SYNC_STANDBY_INIT |SYNC_STANDBY_DEFINED))==SYNC_STANDBY_INIT)
176183
return;
177184

178185
/* Cap the level for anything other than commit to remote flush only. */
@@ -188,16 +195,52 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
188195
Assert(MyProc->syncRepState==SYNC_REP_NOT_WAITING);
189196

190197
/*
191-
* We don't wait for sync rep ifWalSndCtl->sync_standbys_defined is not
192-
*set. SeeSyncRepUpdateSyncStandbysDefined.
198+
* We don't wait for sync rep ifSYNC_STANDBY_DEFINED is not set. See
199+
* SyncRepUpdateSyncStandbysDefined().
193200
*
194201
* Also check that the standby hasn't already replied. Unlikely race
195202
* condition but we'll be fetching that cache line anyway so it's likely
196203
* to be a low cost check.
204+
*
205+
* If the sync standby data has not been initialized yet
206+
* (SYNC_STANDBY_INIT is not set), fall back to a check based on the LSN,
207+
* then do a direct GUC check.
197208
*/
198-
if (!WalSndCtl->sync_standbys_defined||
199-
lsn <=WalSndCtl->lsn[mode])
209+
if (WalSndCtl->sync_standbys_status&SYNC_STANDBY_INIT)
210+
{
211+
if ((WalSndCtl->sync_standbys_status&SYNC_STANDBY_DEFINED)==0||
212+
lsn <=WalSndCtl->lsn[mode])
213+
{
214+
LWLockRelease(SyncRepLock);
215+
return;
216+
}
217+
}
218+
elseif (lsn <=WalSndCtl->lsn[mode])
219+
{
220+
/*
221+
* The LSN is older than what we need to wait for. The sync standby
222+
* data has not been initialized yet, but we are OK to not wait
223+
* because we know that there is no point in doing so based on the
224+
* LSN.
225+
*/
226+
LWLockRelease(SyncRepLock);
227+
return;
228+
}
229+
elseif (!SyncStandbysDefined())
200230
{
231+
/*
232+
* If we are here, the sync standby data has not been initialized yet,
233+
* and the LSN is newer than what need to wait for, so we have fallen
234+
* back to the best thing we could do in this case: a check on
235+
* SyncStandbysDefined() to see if the GUC is set or not.
236+
*
237+
* When the GUC has a value, we wait until the checkpointer updates
238+
* the status data because we cannot be sure yet if we should wait or
239+
* not. Here, the GUC has *no* value, we are sure that there is no
240+
* point to wait; this matters for example when initializing a
241+
* cluster, where we should never wait, and no sync standbys is the
242+
* default behavior.
243+
*/
201244
LWLockRelease(SyncRepLock);
202245
return;
203246
}
@@ -938,7 +981,7 @@ SyncRepWakeQueue(bool all, int mode)
938981

939982
/*
940983
* The checkpointer calls this as needed to update the shared
941-
*sync_standbys_defined flag, so that backends don't remain permanently wedged
984+
*sync_standbys_status flag, so that backends don't remain permanently wedged
942985
* if synchronous_standby_names is unset. It's safe to check the current value
943986
* without the lock, because it's only ever updated by one process. But we
944987
* must take the lock to change it.
@@ -948,7 +991,8 @@ SyncRepUpdateSyncStandbysDefined(void)
948991
{
949992
boolsync_standbys_defined=SyncStandbysDefined();
950993

951-
if (sync_standbys_defined!=WalSndCtl->sync_standbys_defined)
994+
if (sync_standbys_defined!=
995+
((WalSndCtl->sync_standbys_status&SYNC_STANDBY_DEFINED)!=0))
952996
{
953997
LWLockAcquire(SyncRepLock,LW_EXCLUSIVE);
954998

@@ -972,7 +1016,30 @@ SyncRepUpdateSyncStandbysDefined(void)
9721016
* backend that hasn't yet reloaded its config might go to sleep on
9731017
* the queue (and never wake up). This prevents that.
9741018
*/
975-
WalSndCtl->sync_standbys_defined=sync_standbys_defined;
1019+
WalSndCtl->sync_standbys_status=SYNC_STANDBY_INIT |
1020+
(sync_standbys_defined ?SYNC_STANDBY_DEFINED :0);
1021+
1022+
LWLockRelease(SyncRepLock);
1023+
}
1024+
elseif ((WalSndCtl->sync_standbys_status&SYNC_STANDBY_INIT)==0)
1025+
{
1026+
LWLockAcquire(SyncRepLock,LW_EXCLUSIVE);
1027+
1028+
/*
1029+
* Note that there is no need to wake up the queues here. We would
1030+
* reach this path only if SyncStandbysDefined() returns false, or it
1031+
* would mean that some backends are waiting with the GUC set. See
1032+
* SyncRepWaitForLSN().
1033+
*/
1034+
Assert(!SyncStandbysDefined());
1035+
1036+
/*
1037+
* Even if there is no sync standby defined, let the readers of this
1038+
* information know that the sync standby data has been initialized.
1039+
* This can just be done once, hence the previous check on
1040+
* SYNC_STANDBY_INIT to avoid useless work.
1041+
*/
1042+
WalSndCtl->sync_standbys_status |=SYNC_STANDBY_INIT;
9761043

9771044
LWLockRelease(SyncRepLock);
9781045
}

‎src/include/replication/walsender_private.h

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,15 +98,30 @@ typedef struct
9898
XLogRecPtrlsn[NUM_SYNC_REP_WAIT_MODE];
9999

100100
/*
101-
*Are any sync standbys defined? Waiting backends can't reload the
102-
* config file safely, so checkpointer updates this value as needed.
103-
* Protected by SyncRepLock.
101+
*Status of data related to the synchronous standbys. Waiting backends
102+
*can't reload theconfig file safely, so checkpointer updates this value
103+
*as needed.Protected by SyncRepLock.
104104
*/
105-
boolsync_standbys_defined;
105+
bits8sync_standbys_status;
106106

107107
WalSndwalsnds[FLEXIBLE_ARRAY_MEMBER];
108108
}WalSndCtlData;
109109

110+
/* Flags for WalSndCtlData->sync_standbys_status */
111+
112+
/*
113+
* Is the synchronous standby data initialized from the GUC? This is set the
114+
* first time synchronous_standby_names is processed by the checkpointer.
115+
*/
116+
#defineSYNC_STANDBY_INIT(1 << 0)
117+
118+
/*
119+
* Is the synchronous standby data defined? This is set when
120+
* synchronous_standby_names has some data, after being processed by the
121+
* checkpointer.
122+
*/
123+
#defineSYNC_STANDBY_DEFINED(1 << 1)
124+
110125
externWalSndCtlData*WalSndCtl;
111126

112127

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp