- Notifications
You must be signed in to change notification settings - Fork924
Description
There seems to be a goroutine leak incoderd.(*api).workspaceAgentTurn
.
This could be seen as two bugs:
- Goroutine leak
- Use of
(*http.Request).Context()
afterHijack
in more than one place
Steps to Reproduce
- Enable pprof for coder server
coder ssh dev
ctlr+d
- Goto 1
- Check pprof (
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine
)
The leak is in part due to reliance on thehttp.Request
context and use of websockets. The underlying websocket library calls(*http.Request).Hijack
which disables context propagation.
This happens here:
coder/coderd/workspaceagents.go
Line 304 in668a671
wsConn,err:=websocket.Accept(rw,r,&websocket.AcceptOptions{ |
And the following contexts will not cancel until the http handler completes:
coder/coderd/workspaceagents.go
Line 316 in668a671
netConn:=websocket.NetConn(r.Context(),wsConn,websocket.MessageBinary) |
coder/coderd/workspaceagents.go
Line 320 in668a671
case<-r.Context().Done(): |
We must avoid usingr.Context()
after hijack, unless we are using it with the expectation that the http handler will exit (at which point the context will complete).
I'm unfamiliar with thepion/turn
package, but another factor could be wrt how it handles connection closure, perhaps it does not propagate as we expect since we're not callingwsConn.Close()
due to context reliance?
Similar reliance on request context after hijack is done elsewhere, we should rethink all of them. Example:
coder/coderd/workspaceagents.go
Line 155 in668a671
resource,err:=api.Database.GetWorkspaceResourceByID(r.Context(),workspaceAgent.ResourceID) |