- Notifications
You must be signed in to change notification settings - Fork927
fix: fix pgcoord to delete coordinator row last#12155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This stack of pull requests is managed by Graphite.Learn more about stacking. Join@spikecurtis and the rest of your teammates on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Flagged one thing, but other than that, LGTM.
@@ -454,6 +474,9 @@ func newBinder(ctx context.Context, | |||
workQ: newWorkQ[bKey](ctx), | |||
} | |||
go b.handleBindings() | |||
// add to the waitgroup immediately to avoid any races waiting for it before | |||
// the workers start. | |||
b.workerWG.Add(numBinderWorkers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is there a chance that<-startWorkers
below (i.e.fHB
) doesn't get closed (e.g. some error during startup), and thus, these waitgroups never resolving?
(I didn't try to dig in as to how or wherefHB
is closed as it's not obvious from this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It gets closed unconditionally after we send the first heartbeat (success or fail).
Uh oh!
There was an error while loading.Please reload this page.
Fixes#12141
Fixes#11750
PGCoord shutdown was uncoordinated, so an update at an inopportune time during shutdown would be rejected because the coordinator row was already deleted.
This PR ensures that the PGCoord subcomponents that write updates are shut down before we take down the heartbeats, which is responsible for deleting the coordinator row.