coder/coderPublic

NotificationsYou must be signed in to change notification settings
Fork923
Star10.1k

Need more visibility into agent issues when workspaces are running in kubernetes#17994

Answeredbymatifali

rhysduggan5 asked this question inGeneral

rhysduggan5

May 22, 2025

· 4 comments

AnsweredbymatifaliReturn to top

Discussion options

rhysduggan5
May 22, 2025

This is extremely similar to an issue that was already closed due to not enough information#15867.

This yellow warning can appear for a number of reasons, and some of them are completely unrelated to the agent not being able to connect, but instead to do with the pod that the agent is connecting to being difficult to start up for a variety of issues.

In the example I have screenshotted, the agent cannot connect to this pod (this is a k8s deployment) for 2 reasons:

There are not enough nodes with enough room to support the pod, so before it can deploy it needs to create a new node, and then deploy the node on that pod (this can sometimes take a minute or 2)
Once the pod is deployed, if the image it is using is not cached (Which will always be true if the node was just created), the pod cannot start until the image has been pulled from the relevant registry. In this case, the image is just under 50GB, so it takes a fair amount of time.

The problem with this is that there is absolutely no visibility to this in the Coder UI, and the only way to know that this is happening is to have direct access to the cluster, and look at the events being raised against the pod.

What would be ideal is to have a way to configure the expected startup of a workspace, based on internal knowledge about how long it could potentially take. Or - show the users exactly why the workspace is taking a while. Even just bubbling up the events on the pod the agent is trying to connect to would be an improvement.

You must be logged in to vote

Answered by matifali

May 22, 2025

Hi@rhysduggan5, Thanks for submitting the issue

There are not enough nodes with enough room to support the pod, so before it can deploy it needs to create a new node, and then deploy the node on that pod (this can sometimes take a minute
Once the pod is deployed, if the image it is using is not cached (Which will always be true if the node was just created), the pod cannot start until the image has been pulled from the relevant registry. In this case, the image is just under 50GB, so it takes a fair amount of time.

Both of these issues can be resolved by using theKubernetes Logging integration. It's a small service that you deploy in the same cluster as the workspace pods. I agree t…

View full answer

Replies: 4 comments

Comment options

matifali
May 22, 2025
Maintainer

Part of#15423

You must be logged in to vote

0 replies

Comment options

matifali
May 22, 2025
Maintainer

Hi@rhysduggan5, Thanks for submitting the issue

There are not enough nodes with enough room to support the pod, so before it can deploy it needs to create a new node, and then deploy the node on that pod (this can sometimes take a minute
Once the pod is deployed, if the image it is using is not cached (Which will always be true if the node was just created), the pod cannot start until the image has been pulled from the relevant registry. In this case, the image is just under 50GB, so it takes a fair amount of time.

Both of these issues can be resolved by using theKubernetes Logging integration. It's a small service that you deploy in the same cluster as the workspace pods. I agree that we need to improve the troubleshooting experience for other agent connection issues.

It's a chicken-and-egg problem, as the agent is the one sending logs from the workspace to be displayed in the Dashboard. Without the agent starting, we have no visibility into what's happening

You must be logged in to vote

0 replies

Answer selected bymatifali

Comment options

rhysduggan5
May 22, 2025
Author

I have just installed the Kubernetes Logging Interaction service (that I found shortly after submitting this issue), and it has completely solved the problem. The good news here is that for my users, seeing a big orange box with no context is really poor, but seeing a black box with logs progressing is "mentally" a much more positive environment. Thanks for the tip!

Movatterモバイル変換

Need more visibility into agent issues when workspaces are running in kubernetes#17994

Uh oh!

Uh oh!

rhysduggan5May 22, 2025

Replies: 4 comments

Uh oh!

matifaliMay 22, 2025 Maintainer

Uh oh!

Uh oh!

matifaliMay 22, 2025 Maintainer

Uh oh!

Uh oh!

rhysduggan5May 22, 2025 Author

Uh oh!

matifaliMay 22, 2025 Maintainer

Uh oh!

rhysduggan5
May 22, 2025

matifali
May 22, 2025
Maintainer

matifali
May 22, 2025
Maintainer

rhysduggan5
May 22, 2025
Author

matifali
May 22, 2025
Maintainer