Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit246f136

Browse files
committed
Improve handling of parameter differences in physical replication
When certain parameters are changed on a physical replication primary,this is communicated to standbys using the XLOG_PARAMETER_CHANGE WALrecord. The standby then checks whether its own settings are at leastas big as the ones on the primary. If not, the standby shuts downwith a fatal error.The correspondence of settings between primary and standby is requiredbecause those settings influence certain shared memory sizings thatare required for processing WAL records that the primary might send.For example, if the primary sends a prepared transaction, the standbymust have had max_prepared_transaction set appropriately or it won'tbe able to process those WAL records.However, fatally shutting down the standby immediately upon receipt ofthe parameter change record might be a bit of an overreaction. Theresources related to those settings are not required immediately atthat point, and might never be required if the activity on the primarydoes not exhaust all those resources. If we just let the standby rollon with recovery, it will eventually produce an appropriate error whenthose resources are used.So this patch relaxes this a bit. Upon receipt ofXLOG_PARAMETER_CHANGE, we still check the settings but only issue awarning and set a global flag if there is a problem. Then when weactually hit the resource issue and the flag was set, we issue anotherwarning message with relevant information. At that point we pauserecovery, so a hot standby remains usable. We also repeat the lastwarning message once a minute so it is harder to miss or ignore.Reviewed-by: Sergei Kornilov <sk@zsrv.org>Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>Discussion:https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
1 parenta01e1b8 commit246f136

File tree

6 files changed

+122
-23
lines changed

6 files changed

+122
-23
lines changed

‎doc/src/sgml/high-availability.sgml

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2148,18 +2148,14 @@ LOG: database system is ready to accept read only connections
21482148
</para>
21492149

21502150
<para>
2151-
The setting of some parameters on the standby will need reconfiguration
2152-
if they have been changed on the primary. For these parameters,
2153-
the value on the standby must
2154-
be equal to or greater than the value on the primary.
2155-
Therefore, if you want to increase these values, you should do so on all
2156-
standby servers first, before applying the changes to the primary server.
2157-
Conversely, if you want to decrease these values, you should do so on the
2158-
primary server first, before applying the changes to all standby servers.
2159-
If these parameters
2160-
are not set high enough then the standby will refuse to start.
2161-
Higher values can then be supplied and the server
2162-
restarted to begin recovery again. These parameters are:
2151+
The settings of some parameters determine the size of shared memory for
2152+
tracking transaction IDs, locks, and prepared transactions. These shared
2153+
memory structures should be no smaller on a standby than on the primary.
2154+
Otherwise, it could happen that the standby runs out of shared memory
2155+
during recovery. For example, if the primary uses a prepared transaction
2156+
but the standby did not allocate any shared memory for tracking prepared
2157+
transactions, then recovery will abort and cannot continue until the
2158+
standby's configuration is changed. The parameters affected are:
21632159

21642160
<itemizedlist>
21652161
<listitem>
@@ -2188,6 +2184,34 @@ LOG: database system is ready to accept read only connections
21882184
</para>
21892185
</listitem>
21902186
</itemizedlist>
2187+
2188+
The easiest way to ensure this does not become a problem is to have these
2189+
parameters set on the standbys to values equal to or greater than on the
2190+
primary. Therefore, if you want to increase these values, you should do
2191+
so on all standby servers first, before applying the changes to the
2192+
primary server. Conversely, if you want to decrease these values, you
2193+
should do so on the primary server first, before applying the changes to
2194+
all standby servers. The WAL tracks changes to these parameters on the
2195+
primary, and if a standby processes WAL that indicates that the current
2196+
value on the primary is higher than its own value, it will log a warning, for example:
2197+
<screen>
2198+
WARNING: insufficient setting for parameter max_connections
2199+
DETAIL: max_connections = 80 is a lower setting than on the master server (where its value was 100).
2200+
HINT: Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.
2201+
</screen>
2202+
Recovery will continue but could abort at any time thereafter. (It could
2203+
also never end up failing if the activity on the primary does not actually
2204+
require the full extent of the allocated shared memory resources.) If
2205+
recovery reaches a point where it cannot continue due to lack of shared
2206+
memory, recovery will pause and another warning will be logged, for example:
2207+
<screen>
2208+
WARNING: recovery paused because of insufficient parameter settings
2209+
DETAIL: See earlier in the log about which settings are insufficient.
2210+
HINT: Recovery cannot continue unless the configuration is changed and the server restarted.
2211+
</screen>
2212+
This warning will repeated once a minute. At that point, the settings on
2213+
the standby need to be updated and the instance restarted before recovery
2214+
can continue.
21912215
</para>
21922216

21932217
<para>

‎src/backend/access/transam/twophase.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2360,11 +2360,14 @@ PrepareRedoAdd(char *buf, XLogRecPtr start_lsn,
23602360

23612361
/* Get a free gxact from the freelist */
23622362
if (TwoPhaseState->freeGXacts==NULL)
2363+
{
2364+
StandbyParamErrorPauseRecovery();
23632365
ereport(ERROR,
23642366
(errcode(ERRCODE_OUT_OF_MEMORY),
23652367
errmsg("maximum number of prepared transactions reached"),
23662368
errhint("Increase max_prepared_transactions (currently %d).",
23672369
max_prepared_xacts)));
2370+
}
23682371
gxact=TwoPhaseState->freeGXacts;
23692372
TwoPhaseState->freeGXacts=gxact->next;
23702373

‎src/backend/access/transam/xlog.c

Lines changed: 64 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,8 @@ boolInArchiveRecovery = false;
264264
staticboolstandby_signal_file_found= false;
265265
staticboolrecovery_signal_file_found= false;
266266

267+
staticboolneed_restart_for_parameter_values= false;
268+
267269
/* Was the last xlog file restored from archive, or local? */
268270
staticboolrestoredFromArchive= false;
269271

@@ -5998,6 +6000,54 @@ SetRecoveryPause(bool recoveryPause)
59986000
SpinLockRelease(&XLogCtl->info_lck);
59996001
}
60006002

6003+
/*
6004+
* If in hot standby, pause recovery because of a parameter conflict.
6005+
*
6006+
* Similar to recoveryPausesHere() but with a different messaging. The user
6007+
* is expected to make the parameter change and restart the server. If they
6008+
* just unpause recovery, they will then run into whatever error is after this
6009+
* function call for the non-hot-standby case.
6010+
*
6011+
* We intentionally do not give advice about specific parameters or values
6012+
* here because it might be misleading. For example, if we run out of lock
6013+
* space, then in the single-server case we would recommend raising
6014+
* max_locks_per_transaction, but in recovery it could equally be the case
6015+
* that max_connections is out of sync with the primary. If we get here, we
6016+
* have already logged any parameter discrepancies in
6017+
* RecoveryRequiresIntParameter(), so users can go back to that and get
6018+
* concrete and accurate information.
6019+
*/
6020+
void
6021+
StandbyParamErrorPauseRecovery(void)
6022+
{
6023+
TimestampTzlast_warning=0;
6024+
6025+
if (!AmStartupProcess()|| !need_restart_for_parameter_values)
6026+
return;
6027+
6028+
SetRecoveryPause(true);
6029+
6030+
do
6031+
{
6032+
TimestampTznow=GetCurrentTimestamp();
6033+
6034+
if (TimestampDifferenceExceeds(last_warning,now,60000))
6035+
{
6036+
ereport(WARNING,
6037+
(errmsg("recovery paused because of insufficient parameter settings"),
6038+
errdetail("See earlier in the log about which settings are insufficient."),
6039+
errhint("Recovery cannot continue unless the configuration is changed and the server restarted.")));
6040+
last_warning=now;
6041+
}
6042+
6043+
pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
6044+
pg_usleep(1000000L);/* 1000 ms */
6045+
pgstat_report_wait_end();
6046+
HandleStartupProcInterrupts();
6047+
}
6048+
while (RecoveryIsPaused());
6049+
}
6050+
60016051
/*
60026052
* When recovery_min_apply_delay is set, we wait long enough to make sure
60036053
* certain record types are applied at least that interval behind the master.
@@ -6177,16 +6227,20 @@ GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream)
61776227
* Note that text field supplied is a parameter name and does not require
61786228
* translation
61796229
*/
6180-
#defineRecoveryRequiresIntParameter(param_name,currValue,minValue) \
6181-
do { \
6182-
if ((currValue) < (minValue)) \
6183-
ereport(ERROR, \
6184-
(errcode(ERRCODE_INVALID_PARAMETER_VALUE), \
6185-
errmsg("hot standby is not possible because %s = %d is a lower setting than on the master server (its value was %d)", \
6186-
param_name, \
6187-
currValue, \
6188-
minValue))); \
6189-
} while(0)
6230+
staticvoid
6231+
RecoveryRequiresIntParameter(constchar*param_name,intcurrValue,intminValue)
6232+
{
6233+
if (currValue<minValue)
6234+
{
6235+
ereport(WARNING,
6236+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
6237+
errmsg("insufficient setting for parameter %s",param_name),
6238+
errdetail("%s = %d is a lower setting than on the master server (where its value was %d).",
6239+
param_name,currValue,minValue),
6240+
errhint("Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.")));
6241+
need_restart_for_parameter_values= true;
6242+
}
6243+
}
61906244

61916245
/*
61926246
* Check to see if required parameters are set high enough on this server

‎src/backend/storage/ipc/procarray.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3654,7 +3654,14 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
36543654
* If it still won't fit then we're out of memory
36553655
*/
36563656
if (head+nxids>pArray->maxKnownAssignedXids)
3657-
elog(ERROR,"too many KnownAssignedXids");
3657+
{
3658+
StandbyParamErrorPauseRecovery();
3659+
ereport(ERROR,
3660+
(errcode(ERRCODE_OUT_OF_MEMORY),
3661+
errmsg("out of shared memory"),
3662+
errdetail("There are no more KnownAssignedXids slots."),
3663+
errhint("You might need to increase max_connections.")));
3664+
}
36583665
}
36593666

36603667
/* Now we can insert the xids into the space starting at head */

‎src/backend/storage/lmgr/lock.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -965,10 +965,13 @@ LockAcquireExtended(const LOCKTAG *locktag,
965965
if (locallockp)
966966
*locallockp=NULL;
967967
if (reportMemoryError)
968+
{
969+
StandbyParamErrorPauseRecovery();
968970
ereport(ERROR,
969971
(errcode(ERRCODE_OUT_OF_MEMORY),
970972
errmsg("out of shared memory"),
971973
errhint("You might need to increase max_locks_per_transaction.")));
974+
}
972975
else
973976
returnLOCKACQUIRE_NOT_AVAIL;
974977
}
@@ -1003,10 +1006,13 @@ LockAcquireExtended(const LOCKTAG *locktag,
10031006
if (locallockp)
10041007
*locallockp=NULL;
10051008
if (reportMemoryError)
1009+
{
1010+
StandbyParamErrorPauseRecovery();
10061011
ereport(ERROR,
10071012
(errcode(ERRCODE_OUT_OF_MEMORY),
10081013
errmsg("out of shared memory"),
10091014
errhint("You might need to increase max_locks_per_transaction.")));
1015+
}
10101016
else
10111017
returnLOCKACQUIRE_NOT_AVAIL;
10121018
}
@@ -2828,6 +2834,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
28282834
{
28292835
LWLockRelease(partitionLock);
28302836
LWLockRelease(&MyProc->backendLock);
2837+
StandbyParamErrorPauseRecovery();
28312838
ereport(ERROR,
28322839
(errcode(ERRCODE_OUT_OF_MEMORY),
28332840
errmsg("out of shared memory"),
@@ -4158,6 +4165,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
41584165
if (!lock)
41594166
{
41604167
LWLockRelease(partitionLock);
4168+
StandbyParamErrorPauseRecovery();
41614169
ereport(ERROR,
41624170
(errcode(ERRCODE_OUT_OF_MEMORY),
41634171
errmsg("out of shared memory"),
@@ -4223,6 +4231,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
42234231
elog(PANIC,"lock table corrupted");
42244232
}
42254233
LWLockRelease(partitionLock);
4234+
StandbyParamErrorPauseRecovery();
42264235
ereport(ERROR,
42274236
(errcode(ERRCODE_OUT_OF_MEMORY),
42284237
errmsg("out of shared memory"),
@@ -4515,6 +4524,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
45154524
{
45164525
LWLockRelease(partitionLock);
45174526
LWLockRelease(&proc->backendLock);
4527+
StandbyParamErrorPauseRecovery();
45184528
ereport(ERROR,
45194529
(errcode(ERRCODE_OUT_OF_MEMORY),
45204530
errmsg("out of shared memory"),

‎src/include/access/xlog.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,7 @@ extern XLogRecPtr GetXLogInsertRecPtr(void);
287287
externXLogRecPtrGetXLogWriteRecPtr(void);
288288
externboolRecoveryIsPaused(void);
289289
externvoidSetRecoveryPause(boolrecoveryPause);
290+
externvoidStandbyParamErrorPauseRecovery(void);
290291
externTimestampTzGetLatestXTime(void);
291292
externTimestampTzGetCurrentChunkReplayStartTime(void);
292293

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp