Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit443b482

Browse files
Add new replication mode synchronous_commit = 'write'.
Replication occurs only to memory on standby, not to disk,so provides additional performance if user wishes toreduce durability level slightly. Adds concept of multipleindependent sync rep queues.Fujii Masao and Simon Riggs
1 parent89dda5f commit443b482

File tree

8 files changed

+124
-52
lines changed

8 files changed

+124
-52
lines changed

‎doc/src/sgml/config.sgml

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1560,7 +1560,7 @@ SET ENABLE_SEQSCAN TO OFF;
15601560
<para>
15611561
Specifies whether transaction commit will wait for WAL records
15621562
to be written to disk before the command returns a <quote>success</>
1563-
indication to the client. Valid values are <literal>on</>,
1563+
indication to the client. Valid values are <literal>on</>, <literal>write</>,
15641564
<literal>local</>, and <literal>off</>. The default, and safe, value
15651565
is <literal>on</>. When <literal>off</>, there can be a delay between
15661566
when success is reported to the client and when the transaction is
@@ -1580,11 +1580,19 @@ SET ENABLE_SEQSCAN TO OFF;
15801580
If <xref linkend="guc-synchronous-standby-names"> is set, this
15811581
parameter also controls whether or not transaction commit will wait
15821582
for the transaction's WAL records to be flushed to disk and replicated
1583-
to the standby server. The commit wait will last until a reply from
1584-
the current synchronous standby indicates it has written the commit
1585-
record of the transaction to durable storage. If synchronous
1583+
to the standby server. When <literal>write</>, the commit wait will
1584+
last until a reply from the current synchronous standby indicates
1585+
it has received the commit record of the transaction to memory.
1586+
Normally this causes no data loss at the time of failover. However,
1587+
if both primary and standby crash, and the database cluster of
1588+
the primary gets corrupted, recent committed transactions might
1589+
be lost. When <literal>on</>, the commit wait will last until a reply
1590+
from the current synchronous standby indicates it has flushed
1591+
the commit record of the transaction to durable storage. This
1592+
avoids any data loss unless the database cluster of both primary and
1593+
standby gets corrupted simultaneously. If synchronous
15861594
replication is in use, it will normally be sensible either to wait
1587-
bothforWAL records to reachboththelocal andremote disks, or
1595+
for both localflushandreplication of WAL records, or
15881596
to allow the transaction to commit asynchronously. However, the
15891597
special value <literal>local</> is available for transactions that
15901598
wish to wait for local flush to disk, but not synchronous replication.

‎doc/src/sgml/high-availability.sgml

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1010,6 +1010,16 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
10101010
standby servers using cascaded replication.
10111011
</para>
10121012

1013+
<para>
1014+
Setting <varname>synchronous_commit</> to <literal>write</> will
1015+
cause each commit to wait for confirmation that the standby has received
1016+
the commit record to memory. This provides a lower level of durability
1017+
than <literal>on</> does. However, it's a practically useful setting
1018+
because it can decrease the response time for the transaction, and causes
1019+
no data loss unless both the primary and the standby crashes and
1020+
the database of the primary gets corrupted at the same time.
1021+
</para>
1022+
10131023
<para>
10141024
Users will stop waiting if a fast shutdown is requested. However, as
10151025
when using asynchronous replication, the server will does not fully
@@ -1065,13 +1075,13 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
10651075

10661076
<para>
10671077
Commits made when <varname>synchronous_commit</> is set to <literal>on</>
1068-
will wait until thesync standby responds. The response may never occur
1069-
if the last, or only, standby should crash.
1078+
or <literal>write</>will wait until thesynchronous standby responds. The response
1079+
may never occurif the last, or only, standby should crash.
10701080
</para>
10711081

10721082
<para>
10731083
The best solution for avoiding data loss is to ensure you don't lose
1074-
your last remainingsync standby. This can be achieved by naming multiple
1084+
your last remainingsynchronous standby. This can be achieved by naming multiple
10751085
potential synchronous standbys using <varname>synchronous_standby_names</>.
10761086
The first named standby will be used as the synchronous standby. Standbys
10771087
listed after this will take over the role of synchronous standby if the

‎src/backend/replication/syncrep.c

Lines changed: 75 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@
2020
* per-transaction state information.
2121
*
2222
* Replication is either synchronous or not synchronous (async). If it is
23-
* async, we just fastpath out of here. If it is sync, thenin 9.1we wait
24-
*forthe flush location on the standby before releasing the waiting backend.
23+
* async, we just fastpath out of here. If it is sync, then we wait for
24+
* the write or flush location on the standby before releasing the waiting backend.
2525
* Further complexity in that interaction is expected in later releases.
2626
*
2727
* The best performing way to manage the waiting backends is to have a
@@ -67,13 +67,15 @@ char *SyncRepStandbyNames;
6767

6868
staticboolannounce_next_takeover= true;
6969

70-
staticvoidSyncRepQueueInsert(void);
70+
staticintSyncRepWaitMode=SYNC_REP_NO_WAIT;
71+
72+
staticvoidSyncRepQueueInsert(intmode);
7173
staticvoidSyncRepCancelWait(void);
7274

7375
staticintSyncRepGetStandbyPriority(void);
7476

7577
#ifdefUSE_ASSERT_CHECKING
76-
staticboolSyncRepQueueIsOrderedByLSN(void);
78+
staticboolSyncRepQueueIsOrderedByLSN(intmode);
7779
#endif
7880

7981
/*
@@ -120,7 +122,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
120122
* be a low cost check.
121123
*/
122124
if (!WalSndCtl->sync_standbys_defined||
123-
XLByteLE(XactCommitLSN,WalSndCtl->lsn))
125+
XLByteLE(XactCommitLSN,WalSndCtl->lsn[SyncRepWaitMode]))
124126
{
125127
LWLockRelease(SyncRepLock);
126128
return;
@@ -132,8 +134,8 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
132134
*/
133135
MyProc->waitLSN=XactCommitLSN;
134136
MyProc->syncRepState=SYNC_REP_WAITING;
135-
SyncRepQueueInsert();
136-
Assert(SyncRepQueueIsOrderedByLSN());
137+
SyncRepQueueInsert(SyncRepWaitMode);
138+
Assert(SyncRepQueueIsOrderedByLSN(SyncRepWaitMode));
137139
LWLockRelease(SyncRepLock);
138140

139141
/* Alter ps display to show waiting for sync rep. */
@@ -267,18 +269,19 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
267269
}
268270

269271
/*
270-
* Insert MyProc into SyncRepQueue, maintaining sorted invariant.
272+
* Insert MyProc intothe specifiedSyncRepQueue, maintaining sorted invariant.
271273
*
272274
* Usually we will go at tail of queue, though it's possible that we arrive
273275
* here out of order, so start at tail and work back to insertion point.
274276
*/
275277
staticvoid
276-
SyncRepQueueInsert(void)
278+
SyncRepQueueInsert(intmode)
277279
{
278280
PGPROC*proc;
279281

280-
proc= (PGPROC*)SHMQueuePrev(&(WalSndCtl->SyncRepQueue),
281-
&(WalSndCtl->SyncRepQueue),
282+
Assert(mode >=0&&mode<NUM_SYNC_REP_WAIT_MODE);
283+
proc= (PGPROC*)SHMQueuePrev(&(WalSndCtl->SyncRepQueue[mode]),
284+
&(WalSndCtl->SyncRepQueue[mode]),
282285
offsetof(PGPROC,syncRepLinks));
283286

284287
while (proc)
@@ -290,15 +293,15 @@ SyncRepQueueInsert(void)
290293
if (XLByteLT(proc->waitLSN,MyProc->waitLSN))
291294
break;
292295

293-
proc= (PGPROC*)SHMQueuePrev(&(WalSndCtl->SyncRepQueue),
296+
proc= (PGPROC*)SHMQueuePrev(&(WalSndCtl->SyncRepQueue[mode]),
294297
&(proc->syncRepLinks),
295298
offsetof(PGPROC,syncRepLinks));
296299
}
297300

298301
if (proc)
299302
SHMQueueInsertAfter(&(proc->syncRepLinks),&(MyProc->syncRepLinks));
300303
else
301-
SHMQueueInsertAfter(&(WalSndCtl->SyncRepQueue),&(MyProc->syncRepLinks));
304+
SHMQueueInsertAfter(&(WalSndCtl->SyncRepQueue[mode]),&(MyProc->syncRepLinks));
302305
}
303306

304307
/*
@@ -368,7 +371,8 @@ SyncRepReleaseWaiters(void)
368371
{
369372
volatileWalSndCtlData*walsndctl=WalSndCtl;
370373
volatileWalSnd*syncWalSnd=NULL;
371-
intnumprocs=0;
374+
intnumwrite=0;
375+
intnumflush=0;
372376
intpriority=0;
373377
inti;
374378

@@ -419,20 +423,28 @@ SyncRepReleaseWaiters(void)
419423
return;
420424
}
421425

422-
if (XLByteLT(walsndctl->lsn,MyWalSnd->flush))
426+
/*
427+
* Set the lsn first so that when we wake backends they will release
428+
* up to this location.
429+
*/
430+
if (XLByteLT(walsndctl->lsn[SYNC_REP_WAIT_WRITE],MyWalSnd->write))
423431
{
424-
/*
425-
* Set the lsn first so that when we wake backends they will release
426-
* up to this location.
427-
*/
428-
walsndctl->lsn=MyWalSnd->flush;
429-
numprocs=SyncRepWakeQueue(false);
432+
walsndctl->lsn[SYNC_REP_WAIT_WRITE]=MyWalSnd->write;
433+
numwrite=SyncRepWakeQueue(false,SYNC_REP_WAIT_WRITE);
434+
}
435+
if (XLByteLT(walsndctl->lsn[SYNC_REP_WAIT_FLUSH],MyWalSnd->flush))
436+
{
437+
walsndctl->lsn[SYNC_REP_WAIT_FLUSH]=MyWalSnd->flush;
438+
numflush=SyncRepWakeQueue(false,SYNC_REP_WAIT_FLUSH);
430439
}
431440

432441
LWLockRelease(SyncRepLock);
433442

434-
elog(DEBUG3,"released %d procs up to %X/%X",
435-
numprocs,
443+
elog(DEBUG3,"released %d procs up to write %X/%X, %d procs up to flush %X/%X",
444+
numwrite,
445+
MyWalSnd->write.xlogid,
446+
MyWalSnd->write.xrecoff,
447+
numflush,
436448
MyWalSnd->flush.xlogid,
437449
MyWalSnd->flush.xrecoff);
438450

@@ -507,40 +519,42 @@ SyncRepGetStandbyPriority(void)
507519
}
508520

509521
/*
510-
* Walk queue from head. Set the state of any backends that need to be woken,
511-
* remove them from the queue, and then wake them.Pass all = true to wake
512-
* whole queue; otherwise, just wake up to the walsender's LSN.
522+
* Walk the specified queue from head. Set the state of any backends that
523+
* need to be woken, remove them from the queue, and then wake them.
524+
* Pass all = true to wake whole queue; otherwise, just wake up to
525+
* the walsender's LSN.
513526
*
514527
* Must hold SyncRepLock.
515528
*/
516529
int
517-
SyncRepWakeQueue(boolall)
530+
SyncRepWakeQueue(boolall,intmode)
518531
{
519532
volatileWalSndCtlData*walsndctl=WalSndCtl;
520533
PGPROC*proc=NULL;
521534
PGPROC*thisproc=NULL;
522535
intnumprocs=0;
523536

524-
Assert(SyncRepQueueIsOrderedByLSN());
537+
Assert(mode >=0&&mode<NUM_SYNC_REP_WAIT_MODE);
538+
Assert(SyncRepQueueIsOrderedByLSN(mode));
525539

526-
proc= (PGPROC*)SHMQueueNext(&(WalSndCtl->SyncRepQueue),
527-
&(WalSndCtl->SyncRepQueue),
540+
proc= (PGPROC*)SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
541+
&(WalSndCtl->SyncRepQueue[mode]),
528542
offsetof(PGPROC,syncRepLinks));
529543

530544
while (proc)
531545
{
532546
/*
533547
* Assume the queue is ordered by LSN
534548
*/
535-
if (!all&&XLByteLT(walsndctl->lsn,proc->waitLSN))
549+
if (!all&&XLByteLT(walsndctl->lsn[mode],proc->waitLSN))
536550
returnnumprocs;
537551

538552
/*
539553
* Move to next proc, so we can delete thisproc from the queue.
540554
* thisproc is valid, proc may be NULL after this.
541555
*/
542556
thisproc=proc;
543-
proc= (PGPROC*)SHMQueueNext(&(WalSndCtl->SyncRepQueue),
557+
proc= (PGPROC*)SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
544558
&(proc->syncRepLinks),
545559
offsetof(PGPROC,syncRepLinks));
546560

@@ -588,7 +602,12 @@ SyncRepUpdateSyncStandbysDefined(void)
588602
* wants synchronous replication, we'd better wake them up.
589603
*/
590604
if (!sync_standbys_defined)
591-
SyncRepWakeQueue(true);
605+
{
606+
inti;
607+
608+
for (i=0;i<NUM_SYNC_REP_WAIT_MODE;i++)
609+
SyncRepWakeQueue(true,i);
610+
}
592611

593612
/*
594613
* Only allow people to join the queue when there are synchronous
@@ -605,16 +624,18 @@ SyncRepUpdateSyncStandbysDefined(void)
605624

606625
#ifdefUSE_ASSERT_CHECKING
607626
staticbool
608-
SyncRepQueueIsOrderedByLSN(void)
627+
SyncRepQueueIsOrderedByLSN(intmode)
609628
{
610629
PGPROC*proc=NULL;
611630
XLogRecPtrlastLSN;
612631

632+
Assert(mode >=0&&mode<NUM_SYNC_REP_WAIT_MODE);
633+
613634
lastLSN.xlogid=0;
614635
lastLSN.xrecoff=0;
615636

616-
proc= (PGPROC*)SHMQueueNext(&(WalSndCtl->SyncRepQueue),
617-
&(WalSndCtl->SyncRepQueue),
637+
proc= (PGPROC*)SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
638+
&(WalSndCtl->SyncRepQueue[mode]),
618639
offsetof(PGPROC,syncRepLinks));
619640

620641
while (proc)
@@ -628,7 +649,7 @@ SyncRepQueueIsOrderedByLSN(void)
628649

629650
lastLSN=proc->waitLSN;
630651

631-
proc= (PGPROC*)SHMQueueNext(&(WalSndCtl->SyncRepQueue),
652+
proc= (PGPROC*)SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
632653
&(proc->syncRepLinks),
633654
offsetof(PGPROC,syncRepLinks));
634655
}
@@ -675,3 +696,20 @@ check_synchronous_standby_names(char **newval, void **extra, GucSource source)
675696

676697
return true;
677698
}
699+
700+
void
701+
assign_synchronous_commit(intnewval,void*extra)
702+
{
703+
switch (newval)
704+
{
705+
caseSYNCHRONOUS_COMMIT_REMOTE_WRITE:
706+
SyncRepWaitMode=SYNC_REP_WAIT_WRITE;
707+
break;
708+
caseSYNCHRONOUS_COMMIT_REMOTE_FLUSH:
709+
SyncRepWaitMode=SYNC_REP_WAIT_FLUSH;
710+
break;
711+
default:
712+
SyncRepWaitMode=SYNC_REP_NO_WAIT;
713+
break;
714+
}
715+
}

‎src/backend/replication/walsender.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1410,7 +1410,8 @@ WalSndShmemInit(void)
14101410
/* First time through, so initialize */
14111411
MemSet(WalSndCtl,0,WalSndShmemSize());
14121412

1413-
SHMQueueInit(&(WalSndCtl->SyncRepQueue));
1413+
for (i=0;i<NUM_SYNC_REP_WAIT_MODE;i++)
1414+
SHMQueueInit(&(WalSndCtl->SyncRepQueue[i]));
14141415

14151416
for (i=0;i<max_wal_senders;i++)
14161417
{

‎src/backend/utils/misc/guc.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -370,11 +370,12 @@ static const struct config_enum_entry constraint_exclusion_options[] = {
370370
};
371371

372372
/*
373-
* Although only "on", "off", and "local" are documented, we
373+
* Although only "on", "off","write",and "local" are documented, we
374374
* accept all the likely variants of "on" and "off".
375375
*/
376376
staticconststructconfig_enum_entrysynchronous_commit_options[]= {
377377
{"local",SYNCHRONOUS_COMMIT_LOCAL_FLUSH, false},
378+
{"write",SYNCHRONOUS_COMMIT_REMOTE_WRITE, false},
378379
{"on",SYNCHRONOUS_COMMIT_ON, false},
379380
{"off",SYNCHRONOUS_COMMIT_OFF, false},
380381
{"true",SYNCHRONOUS_COMMIT_ON, true},
@@ -3164,7 +3165,7 @@ static struct config_enum ConfigureNamesEnum[] =
31643165
},
31653166
&synchronous_commit,
31663167
SYNCHRONOUS_COMMIT_ON,synchronous_commit_options,
3167-
NULL,NULL,NULL
3168+
NULL,assign_synchronous_commit,NULL
31683169
},
31693170

31703171
{

‎src/include/access/xact.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ typedef enum
5555
{
5656
SYNCHRONOUS_COMMIT_OFF,/* asynchronous commit */
5757
SYNCHRONOUS_COMMIT_LOCAL_FLUSH,/* wait for local flush only */
58+
SYNCHRONOUS_COMMIT_REMOTE_WRITE,/* wait for local flush and remote write */
5859
SYNCHRONOUS_COMMIT_REMOTE_FLUSH/* wait for local and remote flush */
5960
}SyncCommitLevel;
6061

‎src/include/replication/syncrep.h

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,16 @@
1515

1616
#include"utils/guc.h"
1717

18+
#defineSyncRepRequested() \
19+
(max_wal_senders > 0 && synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH)
20+
21+
/* SyncRepWaitMode */
22+
#defineSYNC_REP_NO_WAIT-1
23+
#defineSYNC_REP_WAIT_WRITE0
24+
#defineSYNC_REP_WAIT_FLUSH1
25+
26+
#defineNUM_SYNC_REP_WAIT_MODE2
27+
1828
/* syncRepState */
1929
#defineSYNC_REP_NOT_WAITING0
2030
#defineSYNC_REP_WAITING1
@@ -37,8 +47,9 @@ extern void SyncRepReleaseWaiters(void);
3747
externvoidSyncRepUpdateSyncStandbysDefined(void);
3848

3949
/* called by various procs */
40-
externintSyncRepWakeQueue(boolall);
50+
externintSyncRepWakeQueue(boolall,intmode);
4151

4252
externboolcheck_synchronous_standby_names(char**newval,void**extra,GucSourcesource);
53+
externvoidassign_synchronous_commit(intnewval,void*extra);
4354

4455
#endif/* _SYNCREP_H */

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp