- Notifications
You must be signed in to change notification settings - Fork928
Description
Definition of Done
Clear log should be provided to surface errors so users do not have to go down several rabbit holes to try and understand what went wrong.
One acceptable outcome suggested in backend variety:
the Docker agent init script could log errors and then sleep forever if it hits a problem, instead of exiting. This would keep the container from exiting, and so the end user would see the build succeed, but the agent failed to connect. They could then docker logs ... to see what's up.
alternatively, we just fix kreuzwerker/docker to (optionally?) not delete containers after 15s
What
If the init script fails, there is very little way to debug the root cause (e.g. with Docker).
For example, if you set the Coder access URL to something invalid in the template, then all Docker containers will exit immediately e.g.
In the above example, the error "container exited immediately" is actually coming fromkreuzwerker/docker
itself -- seekreuzwerker/terraform-provider-docker#92
To actually dig out the root cause of the error, you have to e.g. turn on debug logging in the Docker daemon and check the daemon logs to actually see what's going wrong.
This isn't great usability, and we should have a better way of surfacing various kinds of errors from the agent init script.
There are quite a few possibilities for things to go wrong, including:
curl
does not exist- failed to create temporary directory
- failed to fetch agent binary
- failed to
chmod
binary coder agent
failed- ...
Why
Make it easier for users to figure out why their container won't start.