Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix(coderd): ensure agent WebSocket conn is cleaned up#19711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

DanielleMaywood
Copy link
Contributor

@DanielleMaywoodDanielleMaywood commentedSep 5, 2025
edited
Loading

Relates to#19449

This PR fixes an unfortunate bug in the watch workspace agent containers endpoint.

The/containers/watch endpoint works a little like this:

sequenceDiagram  Client->>coderd: `GET /containers/watch`  coderd->>Agent: `GET /containers/watch`  Agent->>coderd: `ACCEPT`  Agent<<->>coderd: WebSocket Connection  coderd->>Client: `ACCEPT`  coderd<<->>Client: WebSocket Connection
Loading

Unfortunately, when the connection between the Client and coderd was closed, the connection between coderd and the Agent was kept alive. Leaving this WebSocket connection alive meant that every 15 seconds a Heartbeat ping was sent between coderd and the agent. As this heartbeat traffic happened over the tailnet, the agent recorded this network traffic as workspace activity, which resulted in coderd bumping the lifetime of the workspace every minute.

The changes introduced in this PR ensure that when the heartbeat between the Client and coderd stop, it cancels a context which causes a cleanup for the WebSocket between coderd and the Agent.

@DanielleMaywoodDanielleMaywoodforce-pushed thedanielle/devcontainers/handle-websocket-conn-cleanup branch from647ffa1 to894e6caCompareSeptember 5, 2025 10:01
@DanielleMaywoodDanielleMaywood marked this pull request as ready for reviewSeptember 5, 2025 10:17
@DanielleMaywoodDanielleMaywood marked this pull request as draftSeptember 5, 2025 10:17
@DanielleMaywoodDanielleMaywoodforce-pushed thedanielle/devcontainers/handle-websocket-conn-cleanup branch from894e6ca tod3c1dd5CompareSeptember 5, 2025 11:17
@DanielleMaywoodDanielleMaywoodforce-pushed thedanielle/devcontainers/handle-websocket-conn-cleanup branch fromd3c1dd5 to35a7440CompareSeptember 5, 2025 11:18
// response to this issue: https://github.com/coder/coder/issues/19449

var (
ctx=testutil.Context(t,testutil.WaitLong)
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Unfortunately this test usestestutil.WaitLong because the WebSocket heartbeat is every15 seconds. I'm not sure if we have any way to decrease this interval for the test, so we need to have a long enough timeout that the test can run for 15 seconds to trigger this heartbeat (which then closes the channel).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's unfortunate 😔. Perhaps we should have aWebsocketHeartbeatInterval on coderd options (requires updating the functions too of course).

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I definitely agree although I think that is out-of-scope for this PR. Should we create an issue to track that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We could do something like

typeHeartbeatOptionfunc(...)funcWithHeartbeatInterval(d time.Duration)HeartbeatOptionfuncHeartbeat(ctx context.Context,conn*websocket.Conn,opts...HeartbeatOption) {...}

It might be worthwhile also unexportinghttpapi.HeartbeatInterval if we do this.

@DanielleMaywoodDanielleMaywood marked this pull request as ready for reviewSeptember 5, 2025 12:18
Copy link
Member

@mafredrimafredri left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How does this change help the underlying issue?

To me this issue seems potentially originating fromwsjson where it reads indefinitely:

// we don't use d.ctx here because it only gets canceled after closing the connection// and a "connection closed" type error is more clear than context canceled.typ, b, err := d.conn.Read(context.Background())

Edit: To clarify, I don't see how this change (addition of context) changes the behavior. Previously we already didWatchContainer anddefer closer.Close(). This callsClose on the underlyingwebsocket.Conn which should disconnect and stop the reads.

The new context does not change how e.g.wsjson channel behaves since it will be cancelledafter returning from the handler. ButClose should already give the same behavior.

@DanielleMaywood
Copy link
ContributorAuthor

How does this change help the underlying issue?

I want to preface this with the understanding that my networking fundamentals are a bit poor but:

go httpapi.Heartbeat spins up a heartbeat goroutine, but when that heartbeat fails due to a closed connection, it just returns. We have no knowledge that the connection has closed. So, for as long as we don't attempt to write anything down the pipe, we're completely unaware of the broken connection. By switching togo httpapi.HeartbeatClose, we can explicitly cancel our context and force a cleanup when the heartbeat fails.

To me this issue seems potentially originating fromwsjson where it reads indefinitely:

// we don't use d.ctx here because it only gets canceled after closing the connection// and a "connection closed" type error is more clear than context canceled.typ, b, err := d.conn.Read(context.Background())

Forgive me but I don't think we're usingwsjson anywhere here? We are also never reading from the client connection either.

@mafredri
Copy link
Member

Forgive me but I don't think we're usingwsjson anywhere here? We are also never reading from the client connection either.

We are in theWatchContainers routine. But I understand now that it's not related to the issue (based on your other comment).

DanielleMaywood reacted with thumbs up emoji

Copy link
Member

@mafredrimafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Initially the mechanism by which we fix the issue was a bit lost on me, but I get it now. Small suggestion but otherwise LGTM.

// response to this issue: https://github.com/coder/coder/issues/19449

var (
ctx=testutil.Context(t,testutil.WaitLong)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's unfortunate 😔. Perhaps we should have aWebsocketHeartbeatInterval on coderd options (requires updating the functions too of course).

Copy link
Member

@johnstcnjohnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder how many other usages ofhttpapi.Heartbeat need to be replaced byhttpapi.HeartbeatClose? It seems like any API-initiated endpoint may be a candidate?

@mafredri
Copy link
Member

mafredri commentedSep 5, 2025
edited
Loading

I wonder how many other usages ofhttpapi.Heartbeat need to be replaced byhttpapi.HeartbeatClose? It seems like any API-initiated endpoint may be a candidate?

@johnstcn Potentially any endpoint that is only rarely writing or reading (or not at all) from the websocket established between browser and coderd. Not sure how common those are.

@DanielleMaywood
Copy link
ContributorAuthor

I wonder how many other usages ofhttpapi.Heartbeat need to be replaced byhttpapi.HeartbeatClose? It seems like any API-initiated endpoint may be a candidate?

Even if all of them are fine as they are, it is definitely quite easy for this to happen again. I think we should create a follow-up ticket to track this

mafredri reacted with thumbs up emoji

@DanielleMaywoodDanielleMaywood merged commite12b621 intomainSep 5, 2025
31 checks passed
@DanielleMaywoodDanielleMaywood deleted the danielle/devcontainers/handle-websocket-conn-cleanup branchSeptember 5, 2025 13:26
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsSep 5, 2025
@DanielleMaywoodDanielleMaywood added cherry-pick/v2.25Needs to be cherry-picked to the 2.25 release branch cherry-pick/v2.26Needs to be cherry-picked to the 2.26 release branch labelsSep 8, 2025
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@mafredrimafredrimafredri approved these changes

@johnstcnjohnstcnjohnstcn approved these changes

Assignees

@DanielleMaywoodDanielleMaywood

Labels
cherry-pick/v2.25Needs to be cherry-picked to the 2.25 release branchcherry-pick/v2.26Needs to be cherry-picked to the 2.26 release branch
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@DanielleMaywood@mafredri@johnstcn

[8]ページ先頭

©2009-2025 Movatter.jp