Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitbc971f4

Browse files
committed
Optimize walsender wake up logic using condition variables
WalSndWakeup() currently loops through all the walsenders slots, with aspinlock acquisition and release for every iteration, to wake up waitingwalsenders.This commonly was not a problem beforee101dfa. But, to allow logicaldecoding on standbys, we need to wake up logical walsenders after every WALrecord is applied on the standby, rather just when flushing WAL or switchingtimelines. This causes a performance regression for workloads replaying a lotof WAL records.To solve this, we use condition variable (CV) to efficiently wake upwalsenders in WalSndWakeup().Every walsender prepares to sleep on a shared memory CV. Note that it justprepares to sleep on the CV (i.e., adds itself to the CV's waitlist), but doesnot actually wait on the CV (IOW, it never calls ConditionVariableSleep()). Itstill uses WaitEventSetWait() for waiting, because CV infrastructure doesn'thandle FeBe socket events currently. The processes (startup process,walreceiver etc.) wanting to wake up walsenders useConditionVariableBroadcast(), which in turn calls SetLatch(), helpingwalsenders come out of WaitEventSetWait().We use separate shared memory CVs for physical and logical walsenders forselective wake ups, see WalSndWakeup() for more details.This approach is simple and reasonably efficient. But not very elegant. Butfor 16 it seems to be a better path than a larger redesign of the CVmechanism. A desirable future improvement would be to add support for CVsinto WaitEventSetWait().This still leaves us with a small regression in very extreme workloads (due tothe spinlock acquisition in ConditionVariableBroadcast() when there are nowaiters) - but that seems acceptable.Reported-by: Andres Freund <andres@anarazel.de>Suggested-by: Andres Freund <andres@anarazel.de>Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>Discussion:https://www.postgresql.org/message-id/20230509190247.3rrplhdgem6su6cg%40awork3.anarazel.de
1 parent30579d2 commitbc971f4

File tree

2 files changed

+53
-24
lines changed

2 files changed

+53
-24
lines changed

‎src/backend/replication/walsender.c

Lines changed: 48 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3309,6 +3309,9 @@ WalSndShmemInit(void)
33093309

33103310
SpinLockInit(&walsnd->mutex);
33113311
}
3312+
3313+
ConditionVariableInit(&WalSndCtl->wal_flush_cv);
3314+
ConditionVariableInit(&WalSndCtl->wal_replay_cv);
33123315
}
33133316
}
33143317

@@ -3330,31 +3333,17 @@ WalSndShmemInit(void)
33303333
void
33313334
WalSndWakeup(boolphysical,boollogical)
33323335
{
3333-
inti;
3334-
3335-
for (i=0;i<max_wal_senders;i++)
3336-
{
3337-
Latch*latch;
3338-
ReplicationKindkind;
3339-
WalSnd*walsnd=&WalSndCtl->walsnds[i];
3340-
3341-
/*
3342-
* Get latch pointer with spinlock held, for the unlikely case that
3343-
* pointer reads aren't atomic (as they're 8 bytes). While at it, also
3344-
* get kind.
3345-
*/
3346-
SpinLockAcquire(&walsnd->mutex);
3347-
latch=walsnd->latch;
3348-
kind=walsnd->kind;
3349-
SpinLockRelease(&walsnd->mutex);
3350-
3351-
if (latch==NULL)
3352-
continue;
3336+
/*
3337+
* Wake up all the walsenders waiting on WAL being flushed or replayed
3338+
* respectively. Note that waiting walsender would have prepared to sleep
3339+
* on the CV (i.e., added itself to the CV's waitlist) in WalSndWait()
3340+
* before actually waiting.
3341+
*/
3342+
if (physical)
3343+
ConditionVariableBroadcast(&WalSndCtl->wal_flush_cv);
33533344

3354-
if ((physical&&kind==REPLICATION_KIND_PHYSICAL)||
3355-
(logical&&kind==REPLICATION_KIND_LOGICAL))
3356-
SetLatch(latch);
3357-
}
3345+
if (logical)
3346+
ConditionVariableBroadcast(&WalSndCtl->wal_replay_cv);
33583347
}
33593348

33603349
/*
@@ -3368,9 +3357,44 @@ WalSndWait(uint32 socket_events, long timeout, uint32 wait_event)
33683357
WaitEventevent;
33693358

33703359
ModifyWaitEvent(FeBeWaitSet,FeBeWaitSetSocketPos,socket_events,NULL);
3360+
3361+
/*
3362+
* We use a condition variable to efficiently wake up walsenders in
3363+
* WalSndWakeup().
3364+
*
3365+
* Every walsender prepares to sleep on a shared memory CV. Note that it
3366+
* just prepares to sleep on the CV (i.e., adds itself to the CV's
3367+
* waitlist), but does not actually wait on the CV (IOW, it never calls
3368+
* ConditionVariableSleep()). It still uses WaitEventSetWait() for
3369+
* waiting, because we also need to wait for socket events. The processes
3370+
* (startup process, walreceiver etc.) wanting to wake up walsenders use
3371+
* ConditionVariableBroadcast(), which in turn calls SetLatch(), helping
3372+
* walsenders come out of WaitEventSetWait().
3373+
*
3374+
* This approach is simple and efficient because, one doesn't have to loop
3375+
* through all the walsenders slots, with a spinlock acquisition and
3376+
* release for every iteration, just to wake up only the waiting
3377+
* walsenders. It makes WalSndWakeup() callers' life easy.
3378+
*
3379+
* XXX: A desirable future improvement would be to add support for CVs
3380+
* into WaitEventSetWait().
3381+
*
3382+
* And, we use separate shared memory CVs for physical and logical
3383+
* walsenders for selective wake ups, see WalSndWakeup() for more details.
3384+
*/
3385+
if (MyWalSnd->kind==REPLICATION_KIND_PHYSICAL)
3386+
ConditionVariablePrepareToSleep(&WalSndCtl->wal_flush_cv);
3387+
elseif (MyWalSnd->kind==REPLICATION_KIND_LOGICAL)
3388+
ConditionVariablePrepareToSleep(&WalSndCtl->wal_replay_cv);
3389+
33713390
if (WaitEventSetWait(FeBeWaitSet,timeout,&event,1,wait_event)==1&&
33723391
(event.events&WL_POSTMASTER_DEATH))
3392+
{
3393+
ConditionVariableCancelSleep();
33733394
proc_exit(1);
3395+
}
3396+
3397+
ConditionVariableCancelSleep();
33743398
}
33753399

33763400
/*

‎src/include/replication/walsender_private.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include"nodes/nodes.h"
1818
#include"nodes/replnodes.h"
1919
#include"replication/syncrep.h"
20+
#include"storage/condition_variable.h"
2021
#include"storage/latch.h"
2122
#include"storage/shmem.h"
2223
#include"storage/spin.h"
@@ -108,6 +109,10 @@ typedef struct
108109
*/
109110
boolsync_standbys_defined;
110111

112+
/* used as a registry of physical / logical walsenders to wake */
113+
ConditionVariablewal_flush_cv;
114+
ConditionVariablewal_replay_cv;
115+
111116
WalSndwalsnds[FLEXIBLE_ARRAY_MEMBER];
112117
}WalSndCtlData;
113118

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp