I can't quite seem to figure this one out. I have a workspace running inside a docker container on an EC2. The EC2 persists, does not terminate, just stops and starts. When I move to a stopped state everything works as expected, but in the UI I get an orange dot and it says that the agent is unhealthy, even though the workspace has stopped. I had assumed there is something where the agent inside the container isn't exiting gracefully maybe? I can't quite get it to look pretty on the UI. When I go back into the instance after restarting from a stopped state the container logs from when the workspace starting stopping look like this: 2025-07-12 04:51:15.095 [debu] net.tailnet.net.wgengine: wg: [v2] [2NeP2] - Receiving keepalive packet2025-07-12 04:51:19.353 [info] stdlib: [ERR] yamux: Failed to read header: failed to get reader: failed to read frame header: EOF2025-07-12 04:51:19.353 [debu] failed to read from protocol error="context canceled"2025-07-12 04:51:19.353 [debu] net.tailnet: setAllPeersLost marked peer lost peer_id=b9465b15-3c85-4b71-b4a6-a80532cc0a36 key_id=[2NeP2]2025-07-12 04:51:19.353 [debu] net.tailnet: peer lost timeout peer_id=b9465b15-3c85-4b71-b4a6-a80532cc0a362025-07-12 04:51:19.354 [debu] net.tailnet: timeout triggered for peer but it had handshake in meantime peer_id=b9465b15-3c85-4b71-b4a6-a80532cc0a36 key_id=[2NeP2]2025-07-12 04:51:19.354 [debu] disconnected from derp map RPC2025-07-12 04:51:19.354 [debu] routine exited name="derp map subscriber" ... error= recv DERPMap error: github.com/coder/coder/v2/agent.(*agent).runDERPMapSubscriber /home/runner/work/coder/coder/agent/agent.go:1683 - context canceled2025-07-12 04:51:19.354 [debu] log sender send loop exiting2025-07-12 04:51:19.354 [debu] swallowing context canceled name="send logs"2025-07-12 04:51:19.354 [debu] disconnected from coordination RPC2025-07-12 04:51:19.354 [debu] swallowing context canceled name=coordination2025-07-12 04:51:19.354 [debu] swallowing context canceled name="report lifecycle"2025-07-12 04:51:19.354 [debu] swallowing context canceled name="report connections"2025-07-12 04:51:19.354 [debu] reportLoop exiting2025-07-12 04:51:19.354 [debu] routine exited name="stats report loop" error=<nil>2025-07-12 04:51:19.354 [debu] swallowing context canceled name="fetch service banner loop"2025-07-12 04:51:19.354 [debu] swallowing context canceled name="report metadata"2025-07-12 04:51:19.354 [debu] routine exited name="app health reporter" error=<nil>2025-07-12 04:51:19.354 [info] connection manager errored ... error= error in routine derp map subscriber: github.com/coder/coder/v2/agent.(*apiConnRoutineManager).startTailnetAPI.func1 /home/runner/work/coder/coder/agent/agent.go:2161 - recv DERPMap error: github.com/coder/coder/v2/agent.(*agent).runDERPMapSubscriber /home/runner/work/coder/coder/agent/agent.go:1683 - context canceled2025-07-12 04:51:19.354 [warn] run exited with error ... error= error in routine derp map subscriber: github.com/coder/coder/v2/agent.(*apiConnRoutineManager).startTailnetAPI.func1 /home/runner/work/coder/coder/agent/agent.go:2161 - recv DERPMap error: github.com/coder/coder/v2/agent.(*agent).runDERPMapSubscriber /home/runner/work/coder/coder/agent/agent.go:1683 - context canceled
followed by this a bunch times: 2025-07-12 04:51:19.517 [info] connecting to coderd2025-07-12 04:51:19.531 [warn] run exited with error ... error= GET https://coder.redacted.com/api/v2/workspaceagents/me/rpc?version=2.6: unexpected status code 401: Workspace agent not authorized.: Try logging in using 'coder login'. Error: The agent cannot authenticate until the workspace provision job has been completed. If the job is no longer running, this agent is invalid.
then I get this 2025-07-12 04:52:39.342 [info] agent shutting down error="context canceled"2025-07-12 04:52:39.342 [info] shutting down agent2025-07-12 04:52:39.342 [debu] set lifecycle state current={"state":"shutting_down","changed_at":"2025-07-12T04:52:39.342832Z"} last={"state":"created","changed_at":"0001-01-01T00:00:00Z"}2025-07-12 04:52:39.343 [debu] ssh-server: closing server2025-07-12 04:52:39.343 [debu] ssh-server: closing all active listeners count=02025-07-12 04:52:39.343 [debu] ssh-server: closing all active sessions count=02025-07-12 04:52:39.343 [debu] ssh-server: closing all active connections count=02025-07-12 04:52:39.343 [debu] ssh-server: closing SSH server2025-07-12 04:52:39.343 [debu] ssh-server: waiting for all goroutines to exit2025-07-12 04:52:39.343 [debu] ssh-server: closing server done2025-07-12 04:52:39.343 [warn] shutdown script(s) failed ... error= execute: not initialized: github.com/coder/coder/v2/agent/agentscripts.(*Runner).Execute.func1 /home/runner/work/coder/coder/agent/agentscripts/agentscripts.go:2082025-07-12 04:52:39.343 [debu] set lifecycle state current={"state":"shutdown_error","changed_at":"2025-07-12T04:52:39.343295Z"} last={"state":"shutting_down","changed_at":"2025-07-12T04:52:39.342832Z"}2025-07-12 04:52:39.344 [debu] containers: closing API2025-07-12 04:52:39.344 [debu] containers: closed API2025-07-12 04:52:39.694 [info] connecting to coderd2025-07-12 04:52:39.699 [warn] run exited with error ...
Then thats followed by more 401 errors of the agent trying to connect. So i guess it tries to exit, fails, and then because it keeps pinging until the last moment the UI goes orange because it never gets a shutdown notification?  |