PGCoord shutdown was uncoordinated, so an update at an inopportune time during shutdown would be rejected because the coordinator row was already deleted.

This PR ensures that the PGCoord subcomponents that write updates are shut down before we take down the heartbeats, which is responsible for deleting the coordinator row.

fix: fix pgcoord to delete coordinator row last

e51ee6e

Copy link

ContributorAuthor

spikecurtis commentedFeb 15, 2024

main
- fix: fix pgcoord to delete coordinator row last #12155 👈

This stack of pull requests is managed by Graphite.Learn more about stacking.

Join@spikecurtis and the rest of your teammates onGraphite

github-actionsbot assignedspikecurtis

Feb 15, 2024

spikecurtis requested review fromcoadler andmafredri

February 15, 2024 09:58

spikecurtis marked this pull request as ready for review

February 15, 2024 09:58

mafredri approved these changes

Feb 15, 2024

View reviewed changes

Copy link

Member

mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Flagged one thing, but other than that, LGTM.

enterprise/tailnet/pgcoord.go

		gob.handleBindings()
		// add to the waitgroup immediately to avoid any races waiting for it before
		// the workers start.
		b.workerWG.Add(numBinderWorkers)

Copy link

Member

mafredriFeb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is there a chance that<-startWorkers below (i.e.fHB) doesn't get closed (e.g. some error during startup), and thus, these waitgroups never resolving?

(I didn't try to dig in as to how or wherefHB is closed as it's not obvious from this PR.)

Copy link

ContributorAuthor

spikecurtisFeb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It gets closed unconditionally after we send the first heartbeat (success or fail).

spikecurtis merged commit627232e intomain

Feb 15, 2024

spikecurtis deleted the spike/12141-flake-write-binding branch

February 15, 2024 12:34

github-actionsbot locked and limited conversation to collaborators

Feb 15, 2024

Labels

None yet

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: fix pgcoord to delete coordinator row last#12155

fix: fix pgcoord to delete coordinator row last#12155

Uh oh!

Conversation

spikecurtis commentedFeb 15, 2024•
edited
Loading

Uh oh!