NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit246f136

committed

Improve handling of parameter differences in physical replication

When certain parameters are changed on a physical replication primary,this is communicated to standbys using the XLOG_PARAMETER_CHANGE WALrecord. The standby then checks whether its own settings are at leastas big as the ones on the primary. If not, the standby shuts downwith a fatal error.The correspondence of settings between primary and standby is requiredbecause those settings influence certain shared memory sizings thatare required for processing WAL records that the primary might send.For example, if the primary sends a prepared transaction, the standbymust have had max_prepared_transaction set appropriately or it won'tbe able to process those WAL records.However, fatally shutting down the standby immediately upon receipt ofthe parameter change record might be a bit of an overreaction. Theresources related to those settings are not required immediately atthat point, and might never be required if the activity on the primarydoes not exhaust all those resources. If we just let the standby rollon with recovery, it will eventually produce an appropriate error whenthose resources are used.So this patch relaxes this a bit. Upon receipt ofXLOG_PARAMETER_CHANGE, we still check the settings but only issue awarning and set a global flag if there is a problem. Then when weactually hit the resource issue and the flag was set, we issue anotherwarning message with relevant information. At that point we pauserecovery, so a hot standby remains usable. We also repeat the lastwarning message once a minute so it is harder to miss or ignore.Reviewed-by: Sergei Kornilov <sk@zsrv.org>Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>Discussion:https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com

1 parenta01e1b8 commit246f136Copy full SHA for 246f136

File tree

6 files changed

+122

-23

lines changed

doc/src/sgml
- high-availability.sgml
src
- backend
  - access/transam
    - twophase.c
    - xlog.c
  - storage
    - ipc
      - procarray.c
    - lmgr
      - lock.c
- include/access
  - xlog.h

6 files changed

+122

-23

lines changed

`‎doc/src/sgml/high-availability.sgml‎`

Lines changed: 36 additions & 12 deletions

Original file line number	Diff line number	Diff line change
`@@ -2148,18 +2148,14 @@ LOG: database system is ready to accept read only connections`
`2148`	`2148`	`</para>`
`2149`	`2149`
`2150`	`2150`	`<para>`
`2151`		`- The setting of some parameters on the standby will need reconfiguration`
`2152`		`- if they have been changed on the primary. For these parameters,`
`2153`		`- the value on the standby must`
`2154`		`- be equal to or greater than the value on the primary.`
`2155`		`- Therefore, if you want to increase these values, you should do so on all`
`2156`		`- standby servers first, before applying the changes to the primary server.`
`2157`		`- Conversely, if you want to decrease these values, you should do so on the`
`2158`		`- primary server first, before applying the changes to all standby servers.`
`2159`		`- If these parameters`
`2160`		`- are not set high enough then the standby will refuse to start.`
`2161`		`- Higher values can then be supplied and the server`
`2162`		`- restarted to begin recovery again. These parameters are:`
	`2151`	`+ The settings of some parameters determine the size of shared memory for`
	`2152`	`+ tracking transaction IDs, locks, and prepared transactions. These shared`
	`2153`	`+ memory structures should be no smaller on a standby than on the primary.`
	`2154`	`+ Otherwise, it could happen that the standby runs out of shared memory`
	`2155`	`+ during recovery. For example, if the primary uses a prepared transaction`
	`2156`	`+ but the standby did not allocate any shared memory for tracking prepared`
	`2157`	`+ transactions, then recovery will abort and cannot continue until the`
	`2158`	`+ standby's configuration is changed. The parameters affected are:`
`2163`	`2159`
`2164`	`2160`	`<itemizedlist>`
`2165`	`2161`	`<listitem>`
`@@ -2188,6 +2184,34 @@ LOG: database system is ready to accept read only connections`
`2188`	`2184`	`</para>`
`2189`	`2185`	`</listitem>`
`2190`	`2186`	`</itemizedlist>`
	`2187`	`+`
	`2188`	`+ The easiest way to ensure this does not become a problem is to have these`
	`2189`	`+ parameters set on the standbys to values equal to or greater than on the`
	`2190`	`+ primary. Therefore, if you want to increase these values, you should do`
	`2191`	`+ so on all standby servers first, before applying the changes to the`
	`2192`	`+ primary server. Conversely, if you want to decrease these values, you`
	`2193`	`+ should do so on the primary server first, before applying the changes to`
	`2194`	`+ all standby servers. The WAL tracks changes to these parameters on the`
	`2195`	`+ primary, and if a standby processes WAL that indicates that the current`
	`2196`	`+ value on the primary is higher than its own value, it will log a warning, for example:`
	`2197`	`+<screen>`
	`2198`	`+WARNING: insufficient setting for parameter max_connections`
	`2199`	`+DETAIL: max_connections = 80 is a lower setting than on the master server (where its value was 100).`
	`2200`	`+HINT: Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.`
	`2201`	`+</screen>`
	`2202`	`+ Recovery will continue but could abort at any time thereafter. (It could`
	`2203`	`+ also never end up failing if the activity on the primary does not actually`
	`2204`	`+ require the full extent of the allocated shared memory resources.) If`
	`2205`	`+ recovery reaches a point where it cannot continue due to lack of shared`
	`2206`	`+ memory, recovery will pause and another warning will be logged, for example:`
	`2207`	`+<screen>`
	`2208`	`+WARNING: recovery paused because of insufficient parameter settings`
	`2209`	`+DETAIL: See earlier in the log about which settings are insufficient.`
	`2210`	`+HINT: Recovery cannot continue unless the configuration is changed and the server restarted.`
	`2211`	`+</screen>`
	`2212`	`+ This warning will repeated once a minute. At that point, the settings on`
	`2213`	`+ the standby need to be updated and the instance restarted before recovery`
	`2214`	`+ can continue.`
`2191`	`2215`	`</para>`
`2192`	`2216`
`2193`	`2217`	`<para>`

`‎src/backend/access/transam/twophase.c‎`

Lines changed: 3 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -2360,11 +2360,14 @@ PrepareRedoAdd(char *buf, XLogRecPtr start_lsn,`
`2360`	`2360`
`2361`	`2361`	`/* Get a free gxact from the freelist */`
`2362`	`2362`	`if (TwoPhaseState->freeGXacts==NULL)`
	`2363`	`+{`
	`2364`	`+StandbyParamErrorPauseRecovery();`
`2363`	`2365`	`ereport(ERROR,`
`2364`	`2366`	`(errcode(ERRCODE_OUT_OF_MEMORY),`
`2365`	`2367`	`errmsg("maximum number of prepared transactions reached"),`
`2366`	`2368`	`errhint("Increase max_prepared_transactions (currently %d).",`
`2367`	`2369`	`max_prepared_xacts)));`
	`2370`	`+}`
`2368`	`2371`	`gxact=TwoPhaseState->freeGXacts;`
`2369`	`2372`	`TwoPhaseState->freeGXacts=gxact->next;`
`2370`	`2373`

`‎src/backend/access/transam/xlog.c‎`

Lines changed: 64 additions & 10 deletions

Original file line number	Diff line number	Diff line change
`@@ -264,6 +264,8 @@ boolInArchiveRecovery = false;`
`264`	`264`	`staticboolstandby_signal_file_found= false;`
`265`	`265`	`staticboolrecovery_signal_file_found= false;`
`266`	`266`
	`267`	`+staticboolneed_restart_for_parameter_values= false;`
	`268`	`+`
`267`	`269`	`/* Was the last xlog file restored from archive, or local? */`
`268`	`270`	`staticboolrestoredFromArchive= false;`
`269`	`271`
`@@ -5998,6 +6000,54 @@ SetRecoveryPause(bool recoveryPause)`
`5998`	`6000`	`SpinLockRelease(&XLogCtl->info_lck);`
`5999`	`6001`	`}`
`6000`	`6002`
	`6003`	`+/*`
	`6004`	`+ * If in hot standby, pause recovery because of a parameter conflict.`
	`6005`	`+ *`
	`6006`	`+ * Similar to recoveryPausesHere() but with a different messaging. The user`
	`6007`	`+ * is expected to make the parameter change and restart the server. If they`
	`6008`	`+ * just unpause recovery, they will then run into whatever error is after this`
	`6009`	`+ * function call for the non-hot-standby case.`
	`6010`	`+ *`
	`6011`	`+ * We intentionally do not give advice about specific parameters or values`
	`6012`	`+ * here because it might be misleading. For example, if we run out of lock`
	`6013`	`+ * space, then in the single-server case we would recommend raising`
	`6014`	`+ * max_locks_per_transaction, but in recovery it could equally be the case`
	`6015`	`+ * that max_connections is out of sync with the primary. If we get here, we`
	`6016`	`+ * have already logged any parameter discrepancies in`
	`6017`	`+ * RecoveryRequiresIntParameter(), so users can go back to that and get`
	`6018`	`+ * concrete and accurate information.`
	`6019`	`+ */`
	`6020`	`+void`
	`6021`	`+StandbyParamErrorPauseRecovery(void)`
	`6022`	`+{`
	`6023`	`+TimestampTzlast_warning=0;`
	`6024`	`+`
	`6025`	`+if (!AmStartupProcess()\|\| !need_restart_for_parameter_values)`
	`6026`	`+return;`
	`6027`	`+`
	`6028`	`+SetRecoveryPause(true);`
	`6029`	`+`
	`6030`	`+do`
	`6031`	`+{`
	`6032`	`+TimestampTznow=GetCurrentTimestamp();`
	`6033`	`+`
	`6034`	`+if (TimestampDifferenceExceeds(last_warning,now,60000))`
	`6035`	`+{`
	`6036`	`+ereport(WARNING,`
	`6037`	`+(errmsg("recovery paused because of insufficient parameter settings"),`
	`6038`	`+errdetail("See earlier in the log about which settings are insufficient."),`
	`6039`	`+errhint("Recovery cannot continue unless the configuration is changed and the server restarted.")));`
	`6040`	`+last_warning=now;`
	`6041`	`+}`
	`6042`	`+`
	`6043`	`+pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);`
	`6044`	`+pg_usleep(1000000L);/* 1000 ms */`
	`6045`	`+pgstat_report_wait_end();`
	`6046`	`+HandleStartupProcInterrupts();`
	`6047`	`+}`
	`6048`	`+while (RecoveryIsPaused());`
	`6049`	`+}`
	`6050`	`+`
`6001`	`6051`	`/*`
`6002`	`6052`	`* When recovery_min_apply_delay is set, we wait long enough to make sure`
`6003`	`6053`	`* certain record types are applied at least that interval behind the master.`
`@@ -6177,16 +6227,20 @@ GetXLogReceiptTime(TimestampTz rtime, bool fromStream)`
`6177`	`6227`	`* Note that text field supplied is a parameter name and does not require`
`6178`	`6228`	`* translation`
`6179`	`6229`	`*/`
`6180`		`-#defineRecoveryRequiresIntParameter(param_name,currValue,minValue) \`
`6181`		`-do { \`
`6182`		`-if ((currValue) < (minValue)) \`
`6183`		`-ereport(ERROR, \`
`6184`		`-(errcode(ERRCODE_INVALID_PARAMETER_VALUE), \`
`6185`		`- errmsg("hot standby is not possible because %s = %d is a lower setting than on the master server (its value was %d)", \`
`6186`		`-param_name, \`
`6187`		`-currValue, \`
`6188`		`-minValue))); \`
`6189`		`-} while(0)`
	`6230`	`+staticvoid`
	`6231`	`+RecoveryRequiresIntParameter(constchar*param_name,intcurrValue,intminValue)`
	`6232`	`+{`
	`6233`	`+if (currValue<minValue)`
	`6234`	`+{`
	`6235`	`+ereport(WARNING,`
	`6236`	`+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),`
	`6237`	`+errmsg("insufficient setting for parameter %s",param_name),`
	`6238`	`+errdetail("%s = %d is a lower setting than on the master server (where its value was %d).",`
	`6239`	`+param_name,currValue,minValue),`
	`6240`	`+errhint("Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.")));`
	`6241`	`+need_restart_for_parameter_values= true;`
	`6242`	`+}`
	`6243`	`+}`
`6190`	`6244`
`6191`	`6245`	`/*`
`6192`	`6246`	`* Check to see if required parameters are set high enough on this server`

`‎src/backend/storage/ipc/procarray.c‎`

Lines changed: 8 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -3654,7 +3654,14 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,`
`3654`	`3654`	`* If it still won't fit then we're out of memory`
`3655`	`3655`	`*/`
`3656`	`3656`	`if (head+nxids>pArray->maxKnownAssignedXids)`
`3657`		`-elog(ERROR,"too many KnownAssignedXids");`
	`3657`	`+{`
	`3658`	`+StandbyParamErrorPauseRecovery();`
	`3659`	`+ereport(ERROR,`
	`3660`	`+(errcode(ERRCODE_OUT_OF_MEMORY),`
	`3661`	`+errmsg("out of shared memory"),`
	`3662`	`+errdetail("There are no more KnownAssignedXids slots."),`
	`3663`	`+errhint("You might need to increase max_connections.")));`
	`3664`	`+}`
`3658`	`3665`	`}`
`3659`	`3666`
`3660`	`3667`	`/* Now we can insert the xids into the space starting at head */`

`‎src/backend/storage/lmgr/lock.c‎`

Lines changed: 10 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -965,10 +965,13 @@ LockAcquireExtended(const LOCKTAG *locktag,`
`965`	`965`	`if (locallockp)`
`966`	`966`	`*locallockp=NULL;`
`967`	`967`	`if (reportMemoryError)`
	`968`	`+{`
	`969`	`+StandbyParamErrorPauseRecovery();`
`968`	`970`	`ereport(ERROR,`
`969`	`971`	`(errcode(ERRCODE_OUT_OF_MEMORY),`
`970`	`972`	`errmsg("out of shared memory"),`
`971`	`973`	`errhint("You might need to increase max_locks_per_transaction.")));`
	`974`	`+}`
`972`	`975`	`else`
`973`	`976`	`returnLOCKACQUIRE_NOT_AVAIL;`
`974`	`977`	`}`
`@@ -1003,10 +1006,13 @@ LockAcquireExtended(const LOCKTAG *locktag,`
`1003`	`1006`	`if (locallockp)`
`1004`	`1007`	`*locallockp=NULL;`
`1005`	`1008`	`if (reportMemoryError)`
	`1009`	`+{`
	`1010`	`+StandbyParamErrorPauseRecovery();`
`1006`	`1011`	`ereport(ERROR,`
`1007`	`1012`	`(errcode(ERRCODE_OUT_OF_MEMORY),`
`1008`	`1013`	`errmsg("out of shared memory"),`
`1009`	`1014`	`errhint("You might need to increase max_locks_per_transaction.")));`
	`1015`	`+}`
`1010`	`1016`	`else`
`1011`	`1017`	`returnLOCKACQUIRE_NOT_AVAIL;`
`1012`	`1018`	`}`
`@@ -2828,6 +2834,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)`
`2828`	`2834`	`{`
`2829`	`2835`	`LWLockRelease(partitionLock);`
`2830`	`2836`	`LWLockRelease(&MyProc->backendLock);`
	`2837`	`+StandbyParamErrorPauseRecovery();`
`2831`	`2838`	`ereport(ERROR,`
`2832`	`2839`	`(errcode(ERRCODE_OUT_OF_MEMORY),`
`2833`	`2840`	`errmsg("out of shared memory"),`
`@@ -4158,6 +4165,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,`
`4158`	`4165`	`if (!lock)`
`4159`	`4166`	`{`
`4160`	`4167`	`LWLockRelease(partitionLock);`
	`4168`	`+StandbyParamErrorPauseRecovery();`
`4161`	`4169`	`ereport(ERROR,`
`4162`	`4170`	`(errcode(ERRCODE_OUT_OF_MEMORY),`
`4163`	`4171`	`errmsg("out of shared memory"),`
`@@ -4223,6 +4231,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,`
`4223`	`4231`	`elog(PANIC,"lock table corrupted");`
`4224`	`4232`	`}`
`4225`	`4233`	`LWLockRelease(partitionLock);`
	`4234`	`+StandbyParamErrorPauseRecovery();`
`4226`	`4235`	`ereport(ERROR,`
`4227`	`4236`	`(errcode(ERRCODE_OUT_OF_MEMORY),`
`4228`	`4237`	`errmsg("out of shared memory"),`
`@@ -4515,6 +4524,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)`
`4515`	`4524`	`{`
`4516`	`4525`	`LWLockRelease(partitionLock);`
`4517`	`4526`	`LWLockRelease(&proc->backendLock);`
	`4527`	`+StandbyParamErrorPauseRecovery();`
`4518`	`4528`	`ereport(ERROR,`
`4519`	`4529`	`(errcode(ERRCODE_OUT_OF_MEMORY),`
`4520`	`4530`	`errmsg("out of shared memory"),`

`‎src/include/access/xlog.h‎`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -287,6 +287,7 @@ extern XLogRecPtr GetXLogInsertRecPtr(void);`
`287`	`287`	`externXLogRecPtrGetXLogWriteRecPtr(void);`
`288`	`288`	`externboolRecoveryIsPaused(void);`
`289`	`289`	`externvoidSetRecoveryPause(boolrecoveryPause);`
	`290`	`+externvoidStandbyParamErrorPauseRecovery(void);`
`290`	`291`	`externTimestampTzGetLatestXTime(void);`
`291`	`292`	`externTimestampTzGetCurrentChunkReplayStartTime(void);`
`292`	`293`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit246f136

File tree

6 files changed

6 files changed

`‎doc/src/sgml/high-availability.sgml‎`

`‎src/backend/access/transam/twophase.c‎`

`‎src/backend/access/transam/xlog.c‎`

`‎src/backend/storage/ipc/procarray.c‎`

`‎src/backend/storage/lmgr/lock.c‎`

`‎src/include/access/xlog.h‎`

0 commit comments