NotificationsYou must be signed in to change notification settings
Fork1k
Star11.1k

chore: replace wsconncache with a single tailnet#8176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

coadler merged 26 commits intomainfromcolin/rm-wsconncache2

Jul 12, 2023

Merged

chore: replace wsconncache with a single tailnet#8176

coadler merged 26 commits intomainfromcolin/rm-wsconncache2

Jul 12, 2023

Conversation

Copy link

Contributor

coadler commentedJun 22, 2023•
edited
Loading

This PR mostly removeswsconncache from coderd, instead replacing it with a single Tailnet per coderd. The biggest benefit to this is not needing a cache of Tailnets sitting around for each agent. Additionally, by reusing the samehttp.Transport for proxying to workspaces, we're able to reuse TCP conns between requests instead of them being open and closed after each request. This is the same strategy we use in wgtunnel.

I made some changes to the coordinator interface to support coderd hooking directly in, so I'd definitely like some feedback on that. I haven't yet implemented it for pgcoord.

Left out of this PR is support for moons. This was getting a bit too big, so I'll instead do that in a followup. Moons will still use wsconncache for the time being.

coadler self-assigned this

Jun 22, 2023

coadler marked this pull request as ready for review

June 26, 2023 18:09

chore: replace wsconncache with a single tailnet

7222135

coadler force-pushed thecolin/rm-wsconncache2 branch from0ba8fa2 to7222135Compare

June 26, 2023 19:36

coadler commented

Jun 26, 2023

View reviewed changes

agent/agenttest/client.go

		@@ -0,0 +1,169 @@
		package agenttest

Copy link

ContributorAuthor

coadlerJun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This was essentially copy/pasted out ofagent_test.go so it can be reused.

coadler added2 commits

June 26, 2023 19:40

remove duplicate AwaitReachable

d4181b0

properly release net.Conn

2a133d1

coadler requested review fromspikecurtis,deansheather andmafredri

June 26, 2023 19:59

kylecarbs approved these changes

Jun 26, 2023

View reviewed changes

Copy link

Member

kylecarbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I didn't manually test this, but the code looks good. It's sick that it's backward-compatible.

We should have a timeline on the removal of wsconncache, because that seems like some big dead weight.

coderd/tailnet.go OutdatedShow resolvedHide resolved

coderd/tailnet.go


		// NewServerTailnet creates a new tailnet intended for use by coderd. It
		// automatically falls back to wsconncache if a legacy agent is encountered.
		funcNewServerTailnet(

Copy link

Member

kylecarbsJun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Although maybe a bit weird, I'd argue we should put this inworkspaceagents.go sincewsconncache will be going away.

coderd/tailnet.go OutdatedShow resolvedHide resolved

coderd/workspaceagents.goShow resolvedHide resolved

codersdk/workspaceagentconn.goShow resolvedHide resolved

tailnetTransport

aa6bcb6

coadler mentioned this pull request

Jun 26, 2023

Removewsconncache after migration period#8218

Closed

link to issue in deprecation notice

f9040fc

mafredri reviewed

Jun 27, 2023

View reviewed changes

coderd/tailnet.go OutdatedShow resolvedHide resolved

coderd/tailnet.goShow resolvedHide resolved

coderd/wsconncache/wsconncache_test.go OutdatedShow resolvedHide resolved

codersdk/workspaceagentconn.go OutdatedShow resolvedHide resolved

enterprise/tailnet/coordinator.go OutdatedShow resolvedHide resolved

coderd/tailnet.go OutdatedShow resolvedHide resolved

coderd/wsconncache/wsconncache.go OutdatedShow resolvedHide resolved

codersdk/workspaceagents.go OutdatedShow resolvedHide resolved

coderd/tailnet_test.go OutdatedShow resolvedHide resolved

spikecurtis reviewed

Jun 27, 2023

View reviewed changes

tailnet/coordinator.go OutdatedShow resolvedHide resolved

spikecurtis requested changes

Jun 27, 2023

View reviewed changes

tailnet/coordinator.go OutdatedShow resolvedHide resolved

enterprise/tailnet/coordinator.go OutdatedShow resolvedHide resolved

agent/agent.go OutdatedShow resolvedHide resolved

coadler added3 commits

June 28, 2023 02:00

address review comments

a988fef

fixup! address review comments

2e201dc

Merge branch 'main' into colin/rm-wsconncache2

5901f68

coadler requested a review fromspikecurtis

June 28, 2023 06:50

Copy link

ContributorAuthor

coadler commentedJun 28, 2023

Ignore test failures, will fix those in the morning.

spikecurtis requested changes

Jun 28, 2023

View reviewed changes

tailnet/coordinator.go OutdatedShow resolvedHide resolved

tailnet/multiagent.go OutdatedShow resolvedHide resolved

enterprise/tailnet/pgcoord.go


		func (c*pgCoord)ServeMultiAgent(id uuid.UUID) agpl.MultiAgentConn {
		_,_=c,id
		panic("not implemented")// TODO: Implement

Copy link

Contributor

spikecurtisJun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this effectively torpedos thepgCoord, so we can't merge this PR until fixed.

Copy link

ContributorAuthor

coadlerJun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since I'm still mostly flushing out the API I didn't think it made sense to impl onpgCoord yet. Definitely a blocker for merge though.

Copy link

Contributor

spikecurtisJun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

FWIW, I think what we need to do is to drop theagent_id column fromtailnet_clients table, and add a new table that tracks the agents that each client wants to connect to

CREATETABLEtailnet_subscriptions (    client_id uuidNOT NULL,    coordinator_id uuidNOT NULL,    agent_id uuidNOT NULL,PRIMARY KEY (client_id, coordinator_id, agent_id),FOREIGN KEY (client_id)REFERENCES tailnet_clients(id)ON DELETE CASCADE,FOREIGN KEY (coordinator_id)REFERENCES tailnet_coordinators(id)ON DELETE CASCADE);

Copy link

Contributor

spikecurtis commentedJun 28, 2023

FWIW, I think the right architecture for both the in-memory and distributed coordinator is to consider an individual client connection as a special case of the more general "multiagent" idea. That is, a normal end user client is a multiagent where the number of agents is exactly one.

It's much easier to maintain and understand code that computes the general case, and then at the edge, we have adaptations the wire up clients with exactly one agent, rather than trying to build two different sets of logic and keep them consistent.

So, in the core logic of the coordinator, clients are all multi-agent, and we compute and queue up updates on them based on the set of agents they've asked to connect to.

Then, at the very edge we have special case code that either serializes the nodes over the websocket, or sends them out over channels/callbacks depending on whether we have an end user, remote client, or a coderd, in-process client.

deansheather reviewed

Jun 28, 2023

View reviewed changes

coderd/tailnet.go

		// connection to.
		agentNodesmap[uuid.UUID]*tailnetNode

		transport*http.Transport

Copy link

Member

deansheatherJun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is so nice, great work!

codersdk/workspaceagentconn.goShow resolvedHide resolved

codersdk/workspaceagents.go OutdatedShow resolvedHide resolved

tailnet/coordinator.go OutdatedShow resolvedHide resolved

coadler added3 commits

June 28, 2023 21:28

second round

311ea2b

fixup! second round

30aefcb

support swapping coordinators

e55e146

coadler commented

Jun 29, 2023

View reviewed changes

tailnet/coordinator.go Outdated

Comment on lines 181 to 185

		// multiAgents holds all of the unique multiAgents listening on this
		// coordinator. We need to keep track of these separately because we need to
		// make sure they're closed on coordinator shutdown. If not, they won't be
		// able to reopen another multiAgent on the new coordinator.
		multiAgentsmap[uuid.UUID]*MultiAgent

Copy link

ContributorAuthor

coadlerJun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Havent added this to thehaCoordinator yet.

coadler requested a review fromspikecurtis

June 29, 2023 01:41

spikecurtis requested changes

Jun 29, 2023

View reviewed changes

coderd/tailnet.go OutdatedShow resolvedHide resolved

tailnet/multiagent.go OutdatedShow resolvedHide resolved

tailnet/coordinator.go Outdated

		// agentToConnectionSockets maps agent IDs to connection IDs of conns that
		// are subscribed to updates for that agent.
		agentToConnectionSocketsmap[uuid.UUID]map[uuid.UUID]*TrackedConn
		agentToConnectionSocketsmap[uuid.UUID]map[uuid.UUID]Enqueueable

Copy link

Contributor

spikecurtisJun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is great!

tailnet/coordinator.go OutdatedShow resolvedHide resolved

coadler added5 commits

June 29, 2023 22:54

use multiagents for all clients

457470d

Merge branch 'main' into colin/rm-wsconncache2

2c32434

fixups

11f2805

fixup! fixups

a88be66

Merge branch 'main' into colin/rm-wsconncache2

49a8364

coadler requested a review fromspikecurtis

June 30, 2023 04:24

coadler commented

Jun 30, 2023

View reviewed changes

coderd/tailnet.go Outdated

		}
		delete(s.agentNodes,agentID)

		// TODO(coadler): actually remove from the netmap

Copy link

ContributorAuthor

coadlerJun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Will do this before merge

spikecurtis requested changes

Jun 30, 2023

View reviewed changes

tailnet/coordinator.go OutdatedShow resolvedHide resolved

tailnet/trackedconn.go

		// It is exported so that tests can use it.
		constWriteTimeout=time.Second*5

		typeTrackedConnstruct {

Copy link

Contributor

spikecurtisJun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Good call on moving this stuff to its own file. Much more readable.

tailnet/coordinator.go

		Stats() (start,lastWriteint64)
		Overwrites()int64
		CoordinatorClose()error
		Close()error

Copy link

Contributor

spikecurtisJun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't understand the distinction betweenCoordinatorClose() andClose() --- how are they different and why do we need both?

Copy link

ContributorAuthor

coadlerJul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There's sort-of a chicken and egg problem. When you want to close the individual queue, you expect it to be removed from the coordinator. When the coordinator is closing, you want the coordinator to close all of the queues, but you don't want the queues to reach back into the coordinator to close themselves and reenter the mutex.

tailnet/coordinator.go OutdatedShow resolvedHide resolved

tailnet/multiagent.go

		}

		func (mMultiAgent)UpdateSelf(nodeNode)error {
		returnm.OnNodeUpdate(m.ID,node)

Copy link

Contributor

spikecurtisJun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

closing theMultiAgent feels incomplete to me since it seems that we would still send updates, subscribes, and unsubscribes into thecore, even after it's been closed.

Furthermore,NextUpdate() runs on is own, separate Context, rather than respecting the closed state of theMultiAgent. This seems like a recipe for zombie goroutines.

Copy link

ContributorAuthor

coadlerJun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

NextUpdate() does respect the closed state ofMultiAgent sincem.updates gets closed when theMultiAgent is closed.

coadler added6 commits

June 30, 2023 19:33

fixes

5e4b631

Merge branch 'main' into colin/rm-wsconncache2

8ef2fb7

Merge branch 'main' into colin/rm-wsconncache2

9864d89

comment

dd3cc15

fixup! comment

8896ae4

fixup! comment

be4db71

coadler requested a review fromspikecurtis

July 7, 2023 17:21

coadler added2 commits

July 11, 2023 19:38

add feature flag

7bfac9b

Merge branch 'main' into colin/rm-wsconncache2

10d44fe

Copy link

ContributorAuthor

coadler commentedJul 11, 2023

I've made this opt-in via a feature flag, so it won't be enabled by default. The implementations for pgcoord and moons aren't done, but I plan on addressing that in follow-up PRs due to this one getting huge. After both are resolved, I'll remove the feature flag.

spikecurtis approved these changes

Jul 12, 2023

View reviewed changes

Copy link

Contributor

spikecurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I understand this PR is getting big and unwieldy, so I'm fine with it going in behind an experiment flag, but I would like to see a GitHub epic and GitHub issues tracking the things that need a resolution before this can go GA.

coderd/tailnet.go Outdated

		s.nodesMu.Lock()
		agentConn:=s.getAgentConn()
		foragentID,node:=ranges.agentNodes {
		iftime.Since(node.lastConnection)>cutoff {

Copy link

Contributor

spikecurtisJul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This measures the last time westarted a connection, not the last time the connection was used. If we proxy a long-lived connection likeReconnectingPTY or a devURL websocket, it could easily be in use for greater than 30 minutes.

We might need some ref-counting to keep track of the connections to each agent, so that we expire them when they are no longer used.

tailnet/multiagent.go OutdatedShow resolvedHide resolved

coadler added2 commits

July 12, 2023 21:58

fixes

ef44092

Merge branch 'main' into colin/rm-wsconncache2

9a29d85

coadler merged commitd7cbdbd intomain

Jul 12, 2023

coadler deleted the colin/rm-wsconncache2 branch

July 12, 2023 22:38

github-actionsbot locked and limited conversation to collaborators

Jul 12, 2023

Labels

None yet

5 participants

Movatterモバイル変換

chore: replace wsconncache with a single tailnet#8176

chore: replace wsconncache with a single tailnet#8176

Uh oh!

Conversation

coadler commentedJun 22, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylecarbs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coadler commentedJun 28, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spikecurtis commentedJun 28, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coadler commentedJul 11, 2023

coadler commentedJun 22, 2023•
edited
Loading