- Notifications
You must be signed in to change notification settings - Fork1k
fix(coderd): ensure agent WebSocket conn is cleaned up#19711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
fix(coderd): ensure agent WebSocket conn is cleaned up#19711
Conversation
647ffa1
to894e6ca
Compare894e6ca
tod3c1dd5
Compared3c1dd5
to35a7440
Compare// response to this issue: https://github.com/coder/coder/issues/19449 | ||
var ( | ||
ctx=testutil.Context(t,testutil.WaitLong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Unfortunately this test usestestutil.WaitLong
because the WebSocket heartbeat is every15 seconds. I'm not sure if we have any way to decrease this interval for the test, so we need to have a long enough timeout that the test can run for 15 seconds to trigger this heartbeat (which then closes the channel).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That's unfortunate 😔. Perhaps we should have aWebsocketHeartbeatInterval
on coderd options (requires updating the functions too of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I definitely agree although I think that is out-of-scope for this PR. Should we create an issue to track that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could do something like
typeHeartbeatOptionfunc(...)funcWithHeartbeatInterval(d time.Duration)HeartbeatOptionfuncHeartbeat(ctx context.Context,conn*websocket.Conn,opts...HeartbeatOption) {...}
It might be worthwhile also unexportinghttpapi.HeartbeatInterval
if we do this.
mafredri left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
How does this change help the underlying issue?
To me this issue seems potentially originating fromwsjson
where it reads indefinitely:
// we don't use d.ctx here because it only gets canceled after closing the connection// and a "connection closed" type error is more clear than context canceled.typ, b, err := d.conn.Read(context.Background())
Edit: To clarify, I don't see how this change (addition of context) changes the behavior. Previously we already didWatchContainer
anddefer closer.Close()
. This callsClose
on the underlyingwebsocket.Conn
which should disconnect and stop the reads.
The new context does not change how e.g.wsjson
channel behaves since it will be cancelledafter returning from the handler. ButClose
should already give the same behavior.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
I want to preface this with the understanding that my networking fundamentals are a bit poor but:
Forgive me but I don't think we're using |
We are in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Initially the mechanism by which we fix the issue was a bit lost on me, but I get it now. Small suggestion but otherwise LGTM.
Uh oh!
There was an error while loading.Please reload this page.
// response to this issue: https://github.com/coder/coder/issues/19449 | ||
var ( | ||
ctx=testutil.Context(t,testutil.WaitLong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That's unfortunate 😔. Perhaps we should have aWebsocketHeartbeatInterval
on coderd options (requires updating the functions too of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wonder how many other usages ofhttpapi.Heartbeat
need to be replaced byhttpapi.HeartbeatClose
? It seems like any API-initiated endpoint may be a candidate?
mafredri commentedSep 5, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@johnstcn Potentially any endpoint that is only rarely writing or reading (or not at all) from the websocket established between browser and coderd. Not sure how common those are. |
Even if all of them are fine as they are, it is definitely quite easy for this to happen again. I think we should create a follow-up ticket to track this |
e12b621
intomainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Relates to#19449
This PR fixes an unfortunate bug in the watch workspace agent containers endpoint.
The
/containers/watch
endpoint works a little like this:Unfortunately, when the connection between the Client and coderd was closed, the connection between coderd and the Agent was kept alive. Leaving this WebSocket connection alive meant that every 15 seconds a Heartbeat ping was sent between coderd and the agent. As this heartbeat traffic happened over the tailnet, the agent recorded this network traffic as workspace activity, which resulted in coderd bumping the lifetime of the workspace every minute.
The changes introduced in this PR ensure that when the heartbeat between the Client and coderd stop, it cancels a context which causes a cleanup for the WebSocket between coderd and the Agent.