- Notifications
You must be signed in to change notification settings - Fork4.9k
Commite2f42f8
committed
Fix race with synchronous_standby_names at startup
synchronous_standby_names cannot be reloaded safely by backends, and thecheckpointer is in charge of updating a state in shared memory if theGUC is enabled in WalSndCtl, to let the backends know if they shouldwait or not for a given LSN. This provides a strict control on thetiming of the waiting queues if the GUC is enabled or disabled, thenreloaded. The checkpointer is also in charge of waking up the backendsthat could be waiting for a LSN when the GUC is disabled.This logic had a race condition at startup, where it would be possiblefor backends to not wait for a LSN even if synchronous_standby_names isenabled. This would cause visibility issues with transactions that weshould be waiting for but they were not. The problem lasts until thecheckpointer does its initial update of the shared memory state when itloads synchronous_standby_names.In order to take care of this problem, the shared memory state inWalSndCtl is extended to detect if it has been initialized by thecheckpointer, and not only check if synchronous_standby_names isdefined. In WalSndCtlData, sync_standbys_defined is renamed tosync_standbys_status, a bits8 able to know about two states:- If the shared memory state has been initialized. This flag is set bythe checkpointer at startup once, and never removed.- If synchronous_standby_names is known as defined in the shared memorystate. This is the same as the previous sync_standbys_defined inWalSndCtl.This method gives a way for backends to decide what they should do untilthe shared memory area is initialized, and they now ultimately fall backto a check on the GUC value in this case, which is the best thing thatcan be done.Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately bythe checkpointer when this process starts, so the window is very narrow.It is possible to enlarge the problematic window by making thecheckpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined()with a hardcoded sleep for example, and doing so has showed that a 2PCvisibility test is indeed failing. On machines slow enough, this bugwould cause spurious failures.In 17~, we have looked at the possibility of adding an injection pointto have a reproducible test, but as the problematic window happens atearly startup, we would need to invent a way to make an injection pointoptionally persistent across restarts when attached, something thatwould be fine for this case as it would involve the checkpointer. Thisissue is quite old, and can be reproduced on all the stable branches.Author: Melnikov Maksim <m.melnikov@postgrespro.ru>Co-authored-by: Michael Paquier <michael@paquier.xyz>Discussion:https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ruBackpatch-through: 131 parentdeb99dc commite2f42f8
File tree
2 files changed
+86
-11
lines changed- src
- backend/replication
- include/replication
2 files changed
+86
-11
lines changedLines changed: 67 additions & 7 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
177 | 177 |
| |
178 | 178 |
| |
179 | 179 |
| |
180 |
| - | |
181 |
| - | |
| 180 | + | |
| 181 | + | |
182 | 182 |
| |
183 | 183 |
| |
184 | 184 |
| |
185 | 185 |
| |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
186 | 190 |
| |
187 |
| - | |
188 |
| - | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
189 | 212 |
| |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
190 | 226 |
| |
191 | 227 |
| |
192 | 228 |
| |
| |||
927 | 963 |
| |
928 | 964 |
| |
929 | 965 |
| |
930 |
| - | |
| 966 | + | |
931 | 967 |
| |
932 | 968 |
| |
933 | 969 |
| |
| |||
937 | 973 |
| |
938 | 974 |
| |
939 | 975 |
| |
940 |
| - | |
| 976 | + | |
| 977 | + | |
941 | 978 |
| |
942 | 979 |
| |
943 | 980 |
| |
| |||
961 | 998 |
| |
962 | 999 |
| |
963 | 1000 |
| |
964 |
| - | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
965 | 1025 |
| |
966 | 1026 |
| |
967 | 1027 |
| |
|
Lines changed: 19 additions & 4 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
98 | 98 |
| |
99 | 99 |
| |
100 | 100 |
| |
101 |
| - | |
102 |
| - | |
103 |
| - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
104 | 104 |
| |
105 |
| - | |
| 105 | + | |
106 | 106 |
| |
107 | 107 |
| |
108 | 108 |
| |
109 | 109 |
| |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
110 | 125 |
| |
111 | 126 |
| |
112 | 127 |
| |
|
0 commit comments
Comments
(0)