NotificationsYou must be signed in to change notification settings
Fork1.1k
Star11.8k

fix: coordinator node update race#7345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

spikecurtis merged 3 commits intomainfromspike/7295-coordinator-node-update

May 2, 2023

Merged

fix: coordinator node update race#7345

spikecurtis merged 3 commits intomainfromspike/7295-coordinator-node-update

May 2, 2023

Conversation

Copy link

Contributor

spikecurtis commentedMay 1, 2023•
edited
Loading

This fixes the most pressing bug on#7295 but leaves thehaCoordinator mainly as is, meaning that we're still subject to various distributed races leaving us in a bad state where clients can't connect.

A full fix forhaCoordinator is quite a large change, so I plan to write an RFC for it.

The main idea of this fix is that we establish an output queue of updates on each connection. This allows us to queue up the initial node update while holding the lock, ensuring we don't miss any updates coming from the other side just as we are connecting. The queue is buffered, and actual I/O on the connection is done on a separate goroutine, where it will never contend for the lock.

There was also a lot of complex manipulation of locking and unlocking the mutex in the existing routines, so I refactored thecoordinator to contain a mutex-protected memory "core". All operations on thiscore are via methods that

c.mutex.Lock()defer c.mutex.Unlock()

so it should be much easier to follow what things are holding the lock and what things aren't (core operations vs the rest of thecoordinator methods).

fix: coordinator node update race

a7ff00c

Signed-off-by: Spike Curtis <spike@coder.com>

spikecurtis requested review fromcoadler andkylecarbs

May 1, 2023 12:06

github-actionsbot assignedspikecurtis

May 1, 2023

kylecarbs reviewed

May 1, 2023

View reviewed changes

tailnet/coordinator.go Outdated

		agentNameCache*lru.Cache[uuid.UUID,string]
		}

		funcNewCore(logger slog.Logger)*Core {

Copy link

Member

kylecarbsMay 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thoughts on putting the coordinator in it's own package? The nameNewCore andCore doesn't make much sense in the context oftailnet itself for providing connections.

An alternative would be putting the in-memory coordinator in it's own package.

Copy link

ContributorAuthor

spikecurtisMay 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I originally exported those names thinking we'd want theCore for thehaCoordinator but I'm now thinking they won't be that reusable. So, if the concern is that the publictailnet API contains these and it's not clear, we could just make them private.

Copy link

Member

kylecarbsMay 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Makes sense to me!

spikecurtis added2 commits

May 2, 2023 06:45

Lint fixes, make core private

cb69570

Signed-off-by: Spike Curtis <spike@coder.com>

Don't log broken connections as errors

370da6f

Signed-off-by: Spike Curtis <spike@coder.com>

kylecarbs approved these changes

May 2, 2023

View reviewed changes

spikecurtis merged commitbd63011 intomain

May 2, 2023

spikecurtis deleted the spike/7295-coordinator-node-update branch

May 2, 2023 16:58

github-actionsbot locked and limited conversation to collaborators

May 2, 2023

Labels

None yet

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: coordinator node update race#7345

fix: coordinator node update race#7345

Uh oh!

Conversation

spikecurtis commentedMay 1, 2023•
edited
Loading

Uh oh!

Uh oh!

kylecarbsMay 1, 2023

Choose a reason for hiding this comment

Uh oh!

spikecurtisMay 1, 2023

Choose a reason for hiding this comment

Uh oh!

kylecarbsMay 1, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Movatterモバイル変換

fix: coordinator node update race#7345

fix: coordinator node update race#7345

Uh oh!

Conversation

spikecurtis commentedMay 1, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

kylecarbsMay 1, 2023

Choose a reason for hiding this comment

Uh oh!

spikecurtisMay 1, 2023

Choose a reason for hiding this comment

Uh oh!

kylecarbsMay 1, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

spikecurtis commentedMay 1, 2023•
edited
Loading