- Notifications
You must be signed in to change notification settings - Fork928
fix: separate signals for passive, active, and forced shutdown#12358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
5715f44
toe823bae
Compare`SIGTERM`: Passive shutdown stopping provisioner daemons from accepting newjobs but waiting for existing jobs to successfully complete.`SIGINT` (old existing behavior): Notify provisioner daemons to cancel in-flight jobs, wait 5s for jobs to be exited, then force quit.`SIGKILL`: Untouched from before, will force-quit.
e823bae
to8d160ae
Compare
mafredri left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I like there being an option to let in-flight jobs complete, but I think it would be worthwhile to consolidate around one shutdown method. (I'm fine with wait for jobs being the default.)
Also, I think we should merge back "interrupt" and "stop" signal slices so that we don't break other commands.
If we still want interrupt in coderd to be "faster" (only useful for development, really?), we could use two additional slices.
varStopSignals= []os.Signal{// new default (prev. interruptsignals renamed)os.Interrupt,syscall.SIGTERM,syscall.SIGHUP,}varStopSignalsNoInterrupt= []os.Signal{syscall.SIGTERM,syscall.SIGHUP,}varInterruptSignals= []os.Signal{os.Interrupt,}
That's my proposal, also moved HUP since there's no inherent reason for HUP to use the fast-sequence either.
(We don't even need to take HUP as a stop signal, but it can be useful for development. If we don't use it for stop we should probably ignore it though to protect users from abrupt shutdown.)
cli/server.go Outdated
exitErr = notifyCtx.Err() | ||
case <-stopCtx.Done(): | ||
exitErr = stopCtx.Err() | ||
graceful = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I feel both methods are graceful, so perhaps a rename here is appropriate?
graceful=true | |
waitForProvisionerJobs=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think that's reasonable.
Uh oh!
There was an error while loading.Please reload this page.
enterprise/cli/provisionerdaemons.go Outdated
@@ -225,7 +225,7 @@ func (r *RootCmd) provisionerDaemonStart() *clibase.Cmd { | |||
cliui.Errorf(inv.Stderr, "Unexpected error, shutting down server: %s\n", exitErr) | |||
} | |||
err = srv.Shutdown(ctx) | |||
err = srv.Shutdown(ctx, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Here we're changing the default behavior for interrupt, and since we split interrupt and stop, we're also not handlingSIGTERM
here anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Fixed
os.Interrupt, | ||
syscall.SIGTERM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is a breaking change in all CLI commands that handle signals. They used to handleTERM
but no longer do.
It would be really fantastic to get this finished and included in a release! That would make deployments much safer since they wouldn't risk interrupting the provisioner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM! 🎉
os.Interrupt, | ||
} | ||
var StopSignalsNoInterrupt = []os.Signal{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I love how Windows leaves you SOL. 🥲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
No doubt!
Uh oh!
There was an error while loading.Please reload this page.
This is from a customer request to gracefully drain instances instead of forcefully killing the jobs.
SIGTERM
: Passive shutdown stopping provisioner daemons from accepting newjobs but waiting for existing jobs to be successfully completed.
SIGINT
(old existing behavior): Notify provisioner daemons to cancel in-flight jobs, wait for 5s for jobs to be exited, then force quit.SIGKILL
: Untouched from before, will force-quit.